-
Bug
-
Resolution: Unresolved
-
P3
-
19, 21, 24, 25
During investigation of JDK-8349637, [~thartmann] found this independent bug. I ([~epeter]) tracked it down to a truncation issue in SuperWord.
Here the code example and some explanations:
https://github.com/openjdk/jdk/pull/23579#issuecomment-2659378298
https://github.com/openjdk/jdk/pull/23579#issuecomment-2659767156
I attached a TestShort.java to this report here.
java -Xbatch -XX:UseAVX=3 -XX:CompileCommand=compileonly,TestShort::test -XX:CompileCommand=printcompilation,TestShort::test -XX:CompileCommand=TraceAutoVectorization,TestShort::test,SW_REJECTIONS -XX:+TraceNewVectors TestShort.java
CompileCommand: compileonly TestShort.test bool compileonly = true
CompileCommand: PrintCompilation TestShort.test bool PrintCompilation = true
CompileCommand: TraceAutoVectorization TestShort.test const char* TraceAutoVectorization = 'SW_REJECTIONS'
Compute gold:
Compile and compute:
4948 88 % b 3 TestShort::test @ 2 (29 bytes)
4949 89 b 3 TestShort::test (29 bytes)
4953 90 % b 4 TestShort::test @ 2 (29 bytes)
4964 91 b 4 TestShort::test (29 bytes)
SuperWord::transform_loop:
Loop: N1213/N197 limit_check counted [int,int),+32 (13651 iters) main has_sfpt strip_mined
1213 CountedLoop === 1213 227 197 [[ 1067 1072 1075 1078 1081 1092 1096 1102 1107 1111 1114 1117 1127 1131 1137 1142 1212 1213 875 1216 879 882 885 895 899 905 910 655 747 560 178 751 757 762 223 650 ]] inner stride: 32 main of N1213 strip mined !orig=[940],[785],[667],[569],[228],[211],[96] !jvms: TestShort::test @ bci:10 (line 21)
TraceNewVectors [AutoVectorization]: 1369 LoadVector === 1060 1216 1145 [[ ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorz<S,32> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 1370 PopCountVI === _ 1369 [[ ]] #vectorz<S,32>
TraceNewVectors [AutoVectorization]: 1371 StoreVector === 1213 1216 1192 1370 [[ ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6;
SuperWord::transform_loop: success
Verify:
Exception in thread "main" java.lang.RuntimeException: Wrong result 12 vs 28 for -1666 at 5
at TestShort.main(TestShort.java:49)
-----------------------------------
Explanation: The issue is in the vectorizer, where we truncate 32 bit values to 8 or 16 bits, and so a 32 bit zero value that should have 32 leading/trailing zeros is now misinterpreted as a 8 or 16 bit zero with only 8 or 16 bit leading/trailing zeros.
Fix alternatives:
- Disable numberOfLeading/TrailingZeros in vectorizer (easiest to implement, but possible performance regression)
- Cast to int, do numberOfLeading/TrailingZeros in int, cast result back.
- Implement special CountLeading/TrailingZerosV in the backend for byte , that knows that it has 32-8 implicit leading/trailing zeros, and for short that it has 32 - 16 implicit leading/trailing zeros.
JDK-8305324 already fixed the same truncation issue for Interger.reverseBytes, sadly at that point we did not look if other operations were similarly affected.
We have to make sure to check all sub-word types for all these operations. We should probably have an explicit function that checks if an operation can safely be bit-truncated, so that we have to add operations to it actively, and do not just truncate them by default, which can slip through the net of testing, when the subword types are not explicitly tested anywhere.
Here the code example and some explanations:
https://github.com/openjdk/jdk/pull/23579#issuecomment-2659378298
https://github.com/openjdk/jdk/pull/23579#issuecomment-2659767156
I attached a TestShort.java to this report here.
java -Xbatch -XX:UseAVX=3 -XX:CompileCommand=compileonly,TestShort::test -XX:CompileCommand=printcompilation,TestShort::test -XX:CompileCommand=TraceAutoVectorization,TestShort::test,SW_REJECTIONS -XX:+TraceNewVectors TestShort.java
CompileCommand: compileonly TestShort.test bool compileonly = true
CompileCommand: PrintCompilation TestShort.test bool PrintCompilation = true
CompileCommand: TraceAutoVectorization TestShort.test const char* TraceAutoVectorization = 'SW_REJECTIONS'
Compute gold:
Compile and compute:
4948 88 % b 3 TestShort::test @ 2 (29 bytes)
4949 89 b 3 TestShort::test (29 bytes)
4953 90 % b 4 TestShort::test @ 2 (29 bytes)
4964 91 b 4 TestShort::test (29 bytes)
SuperWord::transform_loop:
Loop: N1213/N197 limit_check counted [int,int),+32 (13651 iters) main has_sfpt strip_mined
1213 CountedLoop === 1213 227 197 [[ 1067 1072 1075 1078 1081 1092 1096 1102 1107 1111 1114 1117 1127 1131 1137 1142 1212 1213 875 1216 879 882 885 895 899 905 910 655 747 560 178 751 757 762 223 650 ]] inner stride: 32 main of N1213 strip mined !orig=[940],[785],[667],[569],[228],[211],[96] !jvms: TestShort::test @ bci:10 (line 21)
TraceNewVectors [AutoVectorization]: 1369 LoadVector === 1060 1216 1145 [[ ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorz<S,32> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 1370 PopCountVI === _ 1369 [[ ]] #vectorz<S,32>
TraceNewVectors [AutoVectorization]: 1371 StoreVector === 1213 1216 1192 1370 [[ ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6;
SuperWord::transform_loop: success
Verify:
Exception in thread "main" java.lang.RuntimeException: Wrong result 12 vs 28 for -1666 at 5
at TestShort.main(TestShort.java:49)
-----------------------------------
Explanation: The issue is in the vectorizer, where we truncate 32 bit values to 8 or 16 bits, and so a 32 bit zero value that should have 32 leading/trailing zeros is now misinterpreted as a 8 or 16 bit zero with only 8 or 16 bit leading/trailing zeros.
Fix alternatives:
- Disable numberOfLeading/TrailingZeros in vectorizer (easiest to implement, but possible performance regression)
- Cast to int, do numberOfLeading/TrailingZeros in int, cast result back.
- Implement special CountLeading/TrailingZerosV in the backend for byte , that knows that it has 32-8 implicit leading/trailing zeros, and for short that it has 32 - 16 implicit leading/trailing zeros.
We have to make sure to check all sub-word types for all these operations. We should probably have an explicit function that checks if an operation can safely be bit-truncated, so that we have to add operations to it actively, and do not just truncate them by default, which can slip through the net of testing, when the subword types are not explicitly tested anywhere.
- caused by
-
JDK-8284960 Integration of JEP 426: Vector API (Fourth Incubator)
-
- Resolved
-
- relates to
-
JDK-8349637 Integer.numberOfLeadingZeros outputs incorrectly in certain cases
-
- Resolved
-
-
JDK-8305324 C2: Wrong execution of vectorizing Interger.reverseBytes
-
- Resolved
-