-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
P3
-
Affects Version/s: 16
-
Component/s: hotspot
-
x86
Found while working on JDK-8369699.
I could only reproduce it with AVX512, and not even with AVX2.
It also reproduced all the way back to JDK16, to the build afterJDK-8223347, so this seems to be a bug since the beginning of the Vector API.
Reproduce:
java -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printcompilation,Test::test -Xbatch Test.java
...
gold = -3
val = 1
Exception in thread "main" java.lang.RuntimeException: bad value
at Test.main(Test.java:17)
For more info:
java -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printcompilation,Test::test -XX:CompileCommand=printassembly,Test::test -Xbatch -XX:+TraceNewVectors -XX:+PrintIdeal Test.java
----------------------------------
I suspect the bug is somewhere around the 32B mul reduction, something seems to go wrong with the registers.
The code below is generated in C2_MacroAssembler::mulreduce32B, by the AVX512 part.
0x00007ffab8bb7734: vmovdqu 0x10(%rsi),%ymm0
Load 32B values from array:
ymm0 = 0x...101010101010101010101010101fd01
0x00007ffab8bb7739: movabs $0x623aa7050,%r10 ; {oop(a 'java/lang/Byte'[256] {0x0000000623aa7050})}
0x00007ffab8bb7743: mov $0x1,%r11d
0x00007ffab8bb7749: vpmovsxbw %ymm0,%zmm2
Cast the ymm0 with 32 byte values to zmm2 with 32 shorts, so we can do short reduction.
zmm2 = 0x...100010001000100010001fffd0001
0x00007ffab8bb774f: vextracti64x4 $0x1,%zmm2,%ymm2
Take the upper half of zmm2 (all 0x0001 values), and store it over the lower half (ymm2).
The problem: the lower half is now lost. The 0xfffd value is now only to be found in ymm0,
but we never use that value again, and so we lose the -3 (0xfffd) value.
zmm2 = v4_int128 = {0x10001000100010001000100010001, 0x10001000100010001000100010001, 0x0, 0x0}
0x00007ffab8bb7756: vpmullw %ymm2,%ymm2,%ymm2
0x00007ffab8bb775a: vextracti128 $0x1,%ymm2,%xmm1
0x00007ffab8bb7760: vpmullw %xmm2,%xmm1,%xmm1
0x00007ffab8bb7764: vpshufd $0xe,%xmm1,%xmm2
0x00007ffab8bb7769: vpmullw %xmm1,%xmm2,%xmm2
0x00007ffab8bb776d: vpshufd $0x1,%xmm2,%xmm1
0x00007ffab8bb7772: vpmullw %xmm2,%xmm1,%xmm1
0x00007ffab8bb7776: vmovdqu %xmm1,%xmm2
0x00007ffab8bb777a: vpsrldq $0x2,%xmm2,%xmm2
0x00007ffab8bb777f: vpmullw %xmm1,%xmm2,%xmm2
0x00007ffab8bb7783: vmovd %r11d,%xmm1
0x00007ffab8bb7788: vpmovsxwd %xmm2,%xmm2
0x00007ffab8bb778d: vpmulld %xmm1,%xmm2,%xmm2
0x00007ffab8bb7792: vpextrw $0x0,%xmm2,%r9d
0x00007ffab8bb7797: movswl %r9w,%r9d
0x00007ffab8bb779b: movsbl %r9b,%r11d
0x00007ffab8bb779f: movslq %r11d,%r11
0x00007ffab8bb77a2: mov 0x210(%r10,%r11,4),%r11d
0x00007ffab8bb77aa: lea (%r12,%r11,8),%rax
0x00007ffab8bb77ae: vzeroupper
0x00007ffab8bb77b1: add $0x20,%rsp
0x00007ffab8bb77b5: pop %rbp
0x00007ffab8bb77b6: cmp 0x28(%r15),%rsp ; {poll_return}
0x00007ffab8bb77ba: ja 0x00007ffab8bb7854
0x00007ffab8bb77c0: retq
----------------------------------------
I'm a bit confused why such a bug for such a simple (trivial) code shape has not been found before. For example, this test was supposed to catch it:
test/jdk/jdk/incubator/vector/Byte256VectorTests.java
Aaaah, I see why! Our generators always lead to a zero reduction:
1104 static final List<IntFunction<byte[]>> BYTE_GENERATORS = List.of(
1105 withToString("byte[-i * 5]", (int s) -> {
1106 return fill(s * BUFFER_REPS,
1107 i -> (byte)(-i * 5));
1108 }),
1109 withToString("byte[i * 5]", (int s) -> {
1110 return fill(s * BUFFER_REPS,
1111 i -> (byte)(i * 5));
1112 }),
1113 withToString("byte[i + 1]", (int s) -> {
1114 return fill(s * BUFFER_REPS,
1115 i -> (((byte)(i + 1) == 0) ? 1 : (byte)(i + 1)));
1116 }),
1117 withToString("byte[cornerCaseValue(i)]", (int s) -> {
1118 return fill(s * BUFFER_REPS,
1119 i -> cornerCaseValue(i));
1120 })
1121 );
You only need 8 even numbers to overflow all bits, or a single zero. Here, we have 32B values in a vector. Even the upper half of those values (the part we extract down) now has 16B values. If half of them are even, we get 8 bits shifted up, and so we have no non-zero bits left. If we only contain a single zero in the generator numbers, we also get a zero result.
So now I added an additional generator to the test, and the test indeed fails!
diff --git a/test/jdk/jdk/incubator/vector/Byte256VectorTests.java b/test/jdk/jdk/incubator/vector/Byte256VectorTests.java
index ec8958e0605..d7529902afc 100644
--- a/test/jdk/jdk/incubator/vector/Byte256VectorTests.java
+++ b/test/jdk/jdk/incubator/vector/Byte256VectorTests.java
@@ -1114,6 +1114,10 @@ static byte bits(byte e) {
return fill(s * BUFFER_REPS,
i -> (((byte)(i + 1) == 0) ? 1 : (byte)(i + 1)));
}),
+ withToString("special", (int s) -> {
+ return fill(s * BUFFER_REPS,
+ i -> (i == 1) ? (byte)3 : (byte)1);
+ }),
withToString("byte[cornerCaseValue(i)]", (int s) -> {
return fill(s * BUFFER_REPS,
i -> cornerCaseValue(i));
My conclusion: the test generators here are very weak, and NOT SUFFICIENT to give us sufficient coverage.
That is a bit frustrating, but luckily we now are working on JDK-8369699 which will give us much better coverage,
especially because we also test with random inputs.
I could only reproduce it with AVX512, and not even with AVX2.
It also reproduced all the way back to JDK16, to the build after
Reproduce:
java -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printcompilation,Test::test -Xbatch Test.java
...
gold = -3
val = 1
Exception in thread "main" java.lang.RuntimeException: bad value
at Test.main(Test.java:17)
For more info:
java -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printcompilation,Test::test -XX:CompileCommand=printassembly,Test::test -Xbatch -XX:+TraceNewVectors -XX:+PrintIdeal Test.java
----------------------------------
I suspect the bug is somewhere around the 32B mul reduction, something seems to go wrong with the registers.
The code below is generated in C2_MacroAssembler::mulreduce32B, by the AVX512 part.
0x00007ffab8bb7734: vmovdqu 0x10(%rsi),%ymm0
Load 32B values from array:
ymm0 = 0x...101010101010101010101010101fd01
0x00007ffab8bb7739: movabs $0x623aa7050,%r10 ; {oop(a 'java/lang/Byte'[256] {0x0000000623aa7050})}
0x00007ffab8bb7743: mov $0x1,%r11d
0x00007ffab8bb7749: vpmovsxbw %ymm0,%zmm2
Cast the ymm0 with 32 byte values to zmm2 with 32 shorts, so we can do short reduction.
zmm2 = 0x...100010001000100010001fffd0001
0x00007ffab8bb774f: vextracti64x4 $0x1,%zmm2,%ymm2
Take the upper half of zmm2 (all 0x0001 values), and store it over the lower half (ymm2).
The problem: the lower half is now lost. The 0xfffd value is now only to be found in ymm0,
but we never use that value again, and so we lose the -3 (0xfffd) value.
zmm2 = v4_int128 = {0x10001000100010001000100010001, 0x10001000100010001000100010001, 0x0, 0x0}
0x00007ffab8bb7756: vpmullw %ymm2,%ymm2,%ymm2
0x00007ffab8bb775a: vextracti128 $0x1,%ymm2,%xmm1
0x00007ffab8bb7760: vpmullw %xmm2,%xmm1,%xmm1
0x00007ffab8bb7764: vpshufd $0xe,%xmm1,%xmm2
0x00007ffab8bb7769: vpmullw %xmm1,%xmm2,%xmm2
0x00007ffab8bb776d: vpshufd $0x1,%xmm2,%xmm1
0x00007ffab8bb7772: vpmullw %xmm2,%xmm1,%xmm1
0x00007ffab8bb7776: vmovdqu %xmm1,%xmm2
0x00007ffab8bb777a: vpsrldq $0x2,%xmm2,%xmm2
0x00007ffab8bb777f: vpmullw %xmm1,%xmm2,%xmm2
0x00007ffab8bb7783: vmovd %r11d,%xmm1
0x00007ffab8bb7788: vpmovsxwd %xmm2,%xmm2
0x00007ffab8bb778d: vpmulld %xmm1,%xmm2,%xmm2
0x00007ffab8bb7792: vpextrw $0x0,%xmm2,%r9d
0x00007ffab8bb7797: movswl %r9w,%r9d
0x00007ffab8bb779b: movsbl %r9b,%r11d
0x00007ffab8bb779f: movslq %r11d,%r11
0x00007ffab8bb77a2: mov 0x210(%r10,%r11,4),%r11d
0x00007ffab8bb77aa: lea (%r12,%r11,8),%rax
0x00007ffab8bb77ae: vzeroupper
0x00007ffab8bb77b1: add $0x20,%rsp
0x00007ffab8bb77b5: pop %rbp
0x00007ffab8bb77b6: cmp 0x28(%r15),%rsp ; {poll_return}
0x00007ffab8bb77ba: ja 0x00007ffab8bb7854
0x00007ffab8bb77c0: retq
----------------------------------------
I'm a bit confused why such a bug for such a simple (trivial) code shape has not been found before. For example, this test was supposed to catch it:
test/jdk/jdk/incubator/vector/Byte256VectorTests.java
Aaaah, I see why! Our generators always lead to a zero reduction:
1104 static final List<IntFunction<byte[]>> BYTE_GENERATORS = List.of(
1105 withToString("byte[-i * 5]", (int s) -> {
1106 return fill(s * BUFFER_REPS,
1107 i -> (byte)(-i * 5));
1108 }),
1109 withToString("byte[i * 5]", (int s) -> {
1110 return fill(s * BUFFER_REPS,
1111 i -> (byte)(i * 5));
1112 }),
1113 withToString("byte[i + 1]", (int s) -> {
1114 return fill(s * BUFFER_REPS,
1115 i -> (((byte)(i + 1) == 0) ? 1 : (byte)(i + 1)));
1116 }),
1117 withToString("byte[cornerCaseValue(i)]", (int s) -> {
1118 return fill(s * BUFFER_REPS,
1119 i -> cornerCaseValue(i));
1120 })
1121 );
You only need 8 even numbers to overflow all bits, or a single zero. Here, we have 32B values in a vector. Even the upper half of those values (the part we extract down) now has 16B values. If half of them are even, we get 8 bits shifted up, and so we have no non-zero bits left. If we only contain a single zero in the generator numbers, we also get a zero result.
So now I added an additional generator to the test, and the test indeed fails!
diff --git a/test/jdk/jdk/incubator/vector/Byte256VectorTests.java b/test/jdk/jdk/incubator/vector/Byte256VectorTests.java
index ec8958e0605..d7529902afc 100644
--- a/test/jdk/jdk/incubator/vector/Byte256VectorTests.java
+++ b/test/jdk/jdk/incubator/vector/Byte256VectorTests.java
@@ -1114,6 +1114,10 @@ static byte bits(byte e) {
return fill(s * BUFFER_REPS,
i -> (((byte)(i + 1) == 0) ? 1 : (byte)(i + 1)));
}),
+ withToString("special", (int s) -> {
+ return fill(s * BUFFER_REPS,
+ i -> (i == 1) ? (byte)3 : (byte)1);
+ }),
withToString("byte[cornerCaseValue(i)]", (int s) -> {
return fill(s * BUFFER_REPS,
i -> cornerCaseValue(i));
My conclusion: the test generators here are very weak, and NOT SUFFICIENT to give us sufficient coverage.
That is a bit frustrating, but luckily we now are working on JDK-8369699 which will give us much better coverage,
especially because we also test with random inputs.
- blocks
-
JDK-8369699 Template Framework Library: add VectorAPI types and operations
-
- Open
-
- caused by
-
JDK-8223347 Integration of Vector API (Incubator)
-
- Resolved
-