Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8167334

optional hyper-alignment for value types like Long4/Bits256

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Future Project
    • Icon: P3 P3
    • repo-valhalla
    • repo-valhalla
    • hotspot

      On some platforms (x86) vector data types run faster when they are naturally aligned in memory (128/256/512-bit in cache line). Let's get this benefit for such types, by allowing some value types to declare hyper-alignment.

      (By hyper-alignment we mean any alignment more restrictive than the JVM's internal default alignment for heap nodes. In nearly all implementations, the alignment of objects and arrays is arranged so that every long or double can be naturally aligned; this means object and array headers are usually placed at 0 mod 8 bytes, although in some cases 4 mod 8 could be used to avoid fragmentation.)

      Hyper-aligned objects would be requested by first optionally annotating values with their alignment. For example:

          @Aligned128 value class Bits128 { long a, b; ... }

      The class file parser's instance layout algorithm would take care of internally aligning the hyper-aligned fields within the instance (and static) layout. The class file parser would also "roll up" a required total "net alignment" for the instance as a whole.

      The net alignment for a class would specify a number A, a power of two as small as 8 bytes (or whatever is the JVM's default alignment) and as large as 64 bytes (or whatever the cache line size or maximum vector size is, in the host ISA). The net alignment requirement would also specify a second number B, a multiple of 8 bytes (or the JVM's default alignment) less than the first number A. Usually, B would be zero, but it might be (for example) the size of the object's header subtracted from N.

      A corresponding hardwired computation would determine A and B for any flattened array that contains hyper-aligned values.

      The layout algorithm for values also rolls up a net alignment. Values can in theory be aligned to any byte boundary (or even bit boundary). For example, a value made up only of byte fields would have a A/B of 1/0, and could be packed tightly into other values or containing instances or arrays. (This might be called micro-alignment, and is a different issue having to do with value density.) But if a value contains a hyper-aligned sub-value, its net alignment will cause any containing value, array, or instance to align those sub-values correctly, when the value is eventually stored in memory.

      Next, any instance or array containing such values would handshake with the GC's allocation mechanisms align the new object or array up to the required A/B values. (The GC may choose to use a separate TLAB for such purposes, or it may just allow the Java thread to insert padding as needed to round up the high-water mark.) Similarly, when the GC copies such hyper-aligned objects and arrays between regions, it must ensure that the new copies are appropriately aligned. (Again, the GC may choose either special sub-regions, or else just insert padding objects.)

      So, let's prototype a way to create hyper-aligned objects. There should be nullary annotations @Aligned16, @Aligned32, and @Aligned64, defined in the same package as @Contended, which signal that the JVM's layout algorithm should set the "N" value of the annotated type the given number.

      (Alternatives: @AlignedTo(bits=256) is harder for the JVM to parse, but could work. And/or just @Aligned which means align to the size of the value itself.)

      The annotations must apply to value types. The could also apply to object types, in which case the B value might be either 0 (keep whole object on cache line) or (N-H)%N, where H is the internal object header size (keep object fields on cache line); 0 is probably more useful, since the other effect can be obtained by embedding a value type into the object.

      This work would add complexity to an already overburdened layout algorithm in classFileParser.cpp. I suggest refactoring that algorithm to use C++ stream-like design patterns, instead of a multi-page blob of random C logic.

      When laying out a value with hyper-aligned fields, the most strongly aligned fields should come first, and determine the overall alignment restriction A. Other fields should be grouped with their own alignments, and the various groups concatenated in order of decreasing alignment restriction. If the B value for each sub-value is zero, this will reliably provide a reasonably dense layout, again with a B value of zero.

      (Note: There may be padding gaps between the end of one field and the b beginning of the next, as in the case of this nesting of values:

          {ab: {a:long; b:byte}; padbits<56>; c:long}

      Split value layouts may also be helpful, for interleaving values such as byte+long, etc., in a container at maximum density. This should be addressed in follow-on work.)

      When laying out an object with hyper-aligned fields, a similar procedure should be followed. Again, the most strictly aligned field F should be placed first. The header must be placed before F; this leads to a possibly non-zero B value. However, such a thing can sometimes be avoided (if desired) by placing additional fields between F and the header, if those fields are not as strongly aligned. (This is a generalization of the current HotSpot trick of placing an int field in the alignment "gap" between a compressed klass pointer and a leading long field.) In any case, fitting fields between the header and the hyper-aligned field F may reduce fragmentation overhead, depending on details of allocation strategy. Examples:

        class Foo { int128 f; int g; }
        => {header:{m:long,k:int}; g:int; f:int128}; A/B=16/0

        class Bar { int256 f; int g; }
        => {header:{m:long,k:int}; g:int; f:int256}; A/B=32/16
        or
        => {header:{m:long,k:int}; padbits<128>; g:int; (pad:; f:int256}; A/B=32/0
        or
        => {header:{m:long,k:int}; g:int; padbits<128>; (pad:; f:int256}; A/B=32/0
        etc.

      This proposed mechanism also improves the implementation @Contended (JDK-8003985), and may in fact generalize it. In fact, it allows @Contended to use the layout algorithm (with A/B values) to right-size the object or its fields; currently @Contended must use about twice the amount of padding, since hyper-alignment is not available on the object containing the contended data.

      This work should be prototyped in the Valhalla repository. It would be useful to deliver along with any experimental preview of value types coordinated with the Panama Vector API work. See:
        http://cr.openjdk.java.net/~jrose/values/shady-values.html


            Unassigned Unassigned
            jrose John Rose
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: