Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8310190

C2 SuperWord: AlignVector is broken, generates misaligned packs

XMLWordPrintable

    • b05

      I found some examples that indicate to me that AlignVector is fundamentally broken. I have not been able to verify this on a machine that actually would SIGBUS, but I definately see that non-aligned vector loads/stores are generated on a x64 machine.

      For example:
      vmovd %xmm9,0x13(%rdx,%r13,1)

      I have 5 examples.

      I would be interested to have confirmation from a machine that actually requires AlignVector, if this ever leads to true failures.

      -------------------------- Test::test0 --------------------------

      ./java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=compileonly,Test::test0 -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:MaxVectorSize=16 -XX:+AlignVector -XX:+Verbose -XX:LoopUnrollLimit=10000 Test.java

      This and test1 are some control tests, just to see that we get vectorization in the safe cases.

      Here, I get packs that are 0-aligned or 8-aligned, with length of 4 bytes per vector.

      -------------------------- Test::test1 --------------------------

      ./java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=compileonly,Test::test1 -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:MaxVectorSize=16 -XX:+AlignVector -XX:+Verbose -XX:LoopUnrollLimit=10000 Test.java

      This is also a control test for vectorization in a safe case.
      I get 16-packs, all 0-aligned. Good.

      -------------------------- Test::test2 --------------------------

      ./java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=compileonly,Test::test2 -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:MaxVectorSize=16 -XX:+AlignVector -XX:+Verbose -XX:LoopUnrollLimit=10000 Test.java

      This case is somewhat surprising, we actually do not vectorize even though we technically would be allowed to.
      The issue is that find_adjacent_refs seems to find no memref to align to, I think the issue is that we have no memref that is 0-aligned with the vector-width (no "i+0" case).

      -------------------------- Test::test3 --------------------------

      ./java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=compileonly,Test::test3 -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:MaxVectorSize=16 -XX:+AlignVector -XX:+Verbose -XX:LoopUnrollLimit=10000 Test.java

      This case is the same as test2, but we have an additional access at "i+0".
      Now find_adjacent_refs finds a memref to align to. But it turns out later that it actually is not part of a vector!

      But we do create some 4-packs, they are 3 or 11 aligned, however!
      And these are some assembly instructions I can find with -XX:CompileCommand=print,Test::test3:

      vmovd %xmm9,0x13(%rdx,%r13,1)

      It is possible that this still gets aligned, as everything is at a "3-offset", but given that we align to the best-memref found in find_adjacent_refs, this is implausible: that one has a 0-alignment, while the vectors have a 3/11 alignment.

      Pack: 0
       align: 3 2063 StoreB === 2274 2066 2087 2064 [[ 2060 2062 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1634,1362,1153,215 !jvms: Test::test3 @ bci:34 (line 56)
       align: 4 2060 StoreB === 2274 2063 2089 2061 [[ 2055 2059 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1631,1349,1150,263 !jvms: Test::test3 @ bci:47 (line 57)
       align: 5 2055 StoreB === 2274 2060 2056 2058 [[ 2052 2054 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1626,1346,1147,311 !jvms: Test::test3 @ bci:60 (line 58)
       align: 6 2052 StoreB === 2274 2055 2098 2053 [[ 2049 2051 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1623,1343,1143,359,1207 !jvms: Test::test3 @ bci:75 (line 59)

      Pack: 24
       align: 11 2046 StoreB === 2274 2049 2095 2047 [[ 2043 2045 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1617,1337,215 !jvms: Test::test3 @ bci:34 (line 56)
       align: 12 2043 StoreB === 2274 2046 2096 2044 [[ 2040 2042 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1614,1334,263 !jvms: Test::test3 @ bci:47 (line 57)
       align: 13 2040 StoreB === 2274 2043 2092 2041 [[ 2037 2039 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1611,1331,311 !jvms: Test::test3 @ bci:60 (line 58)
       align: 14 2037 StoreB === 2274 2040 2088 2038 [[ 2002 2004 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1608,1327,359,1207 !jvms: Test::test3 @ bci:75 (line 59)

      -------------------------- Test::test4 --------------------------

      ./java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=compileonly,Test::test4 -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:MaxVectorSize=16 -XX:+AlignVector -XX:+Verbose -XX:LoopUnrollLimit=10000 Test.java

      Run with - to see the assembly generated, for example I see:

      vmovd 0x20(%rsi,%r11,1),%xmm14
      vmovq 0x25(%rsi,%r11,1),%xmm26

      These cannot possibly be aligned, their offset is odd!

      This is from -XX:+TraceSuperWord, we see a 4-pack (0 aligned) and an 8-pack (5-aligned), this corresponds to the two assembly instructions above:

      Pack: 0
       align: 0 3139 StoreB === 3358 3362 3155 3140 [[ 3136 3138 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2517,2124,1801,187 !jvms: Test::test4 @ bci:30 (line 53)
       align: 1 3136 StoreB === 3358 3139 3166 3137 [[ 3133 3135 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2514,2121,1798,238 !jvms: Test::test4 @ bci:49 (line 54)
       align: 2 3133 StoreB === 3358 3136 3163 3134 [[ 3130 3132 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2511,2118,1795,290 !jvms: Test::test4 @ bci:68 (line 55)
       align: 3 3130 StoreB === 3358 3133 3165 3131 [[ 3127 3129 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2508,2115,1792,342 !jvms: Test::test4 @ bci:87 (line 56)

      Pack: 48
       align: 5 3127 StoreB === 3358 3130 3160 3128 [[ 3124 3126 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2505,2112,1789,394 !jvms: Test::test4 @ bci:106 (line 58)
       align: 6 3124 StoreB === 3358 3127 3158 3125 [[ 3121 3123 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2502,2109,1786,446 !jvms: Test::test4 @ bci:127 (line 59)
       align: 7 3121 StoreB === 3358 3124 3156 3122 [[ 3118 3120 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2499,2106,1783,498 !jvms: Test::test4 @ bci:148 (line 60)
       align: 8 3118 StoreB === 3358 3121 3164 3119 [[ 3115 3117 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2496,2103,1780,550 !jvms: Test::test4 @ bci:169 (line 61)
       align: 9 3115 StoreB === 3358 3118 3159 3116 [[ 3112 3114 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2493,2100,1777,602 !jvms: Test::test4 @ bci:190 (line 62)
       align: 10 3112 StoreB === 3358 3115 3162 3113 [[ 3109 3111 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2490,2097,1774,654 !jvms: Test::test4 @ bci:211 (line 63)
       align: 11 3109 StoreB === 3358 3112 3157 3110 [[ 3106 3108 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2487,2094,1771,706 !jvms: Test::test4 @ bci:232 (line 64)
       align: 12 3106 StoreB === 3358 3109 3161 3107 [[ 3103 3105 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2484,2091,1768,758,1841 !jvms: Test::test4 @ bci:253 (line 65)

        1. Test24.java
          2 kB
        2. Test21.java
          3 kB
        3. Test.java
          3 kB
        4. generator.py
          28 kB

            epeter Emanuel Peter
            epeter Emanuel Peter
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: