C2 optimization PostLoopMultiversioning is broken and generates incorrect result with latest jdk (14/15). This can be reproduced by below program on x86 with UseAVX=3.
public class Foo {
private static final int SIZE = 65536;
private static void bar(int[] a, int[] b, int[] c, int start, int limit) {
for (int i = start; i < limit; i += 1) {
c[i] = a[i] + b[i];
}
}
public static void main(String[] args) {
int[] a = new int[SIZE];
int[] b = new int[SIZE];
int[] c = new int[SIZE];
for (int i = 0; i < SIZE; i++) {
a[i] = i;
b[i] = i;
c[i] = 0;
}
for (int i = 0; i < 20000; i++) {
bar(a, b, c, 16384, 32768);
}
int sum = 0;
for (int i = 32760; i < 32780; i++) {
sum += c[i];
}
System.out.println(sum);
}
}
$java -XX:+UnlockExperimentalVMOptions -XX:+PostLoopMultiversioning Foo
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f04f181cb15, pid=20589, tid=20611
#
# JRE version: OpenJDK Runtime Environment (16.0) (slowdebug build 16-internal+0-adhoc..jdksrc)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 16-internal+0-adhoc..jdksrc, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x10dbb15] SuperWord::transform_loop(IdealLoopTree*, bool)+0x513
#
# Core dump will be written. Default location: /home/ent-user/case/core
#
# An error report file with more information is saved as:
# /home/ent-user/case/hs_err_pid20589.log
#
# Compiler replay data is saved as:
# /home/ent-user/case/replay_pid20589.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)
The SIGSEGV occurs at C2 code superword.cpp here (http://hg.openjdk.java.net/jdk/jdk/file/cc7b6598df7e/src/hotspot/share/opto/superword.cpp#l173).
Cause is that lpt_next could be null after loop strip mining. The code block checks which post loop (the normal vector post or the multi-versioned post) resides after the main loop. But if the main loop is strip-mined, the _next loop would be null. So to fix this crash we can check if it's strip-mined and search the _parent->_next loop if it is.
Patch:
diff --git a/src/hotspot/share/opto/superword.cpp b/src/hotspot/share/opto/superword.cpp
index 0f4da5e8cfa..caf59461164 100644
--- a/src/hotspot/share/opto/superword.cpp
+++ b/src/hotspot/share/opto/superword.cpp
@@ -169,7 +169,7 @@ void SuperWord::transform_loop(IdealLoopTree* lpt, bool do_optimization) {
SLP_extract();
if (PostLoopMultiversioning && Matcher::has_predicated_vectors()) {
if (cl->is_vectorized_loop() && cl->is_main_loop() && !cl->is_reduction_loop()) {
- IdealLoopTree *lpt_next = lpt->_next;
+ IdealLoopTree *lpt_next = cl->is_strip_mined() ? lpt->_parent->_next : lpt->_next;
CountedLoopNode *cl_next = lpt_next->_head->as_CountedLoop();
_phase->has_range_checks(lpt_next);
if (cl_next->is_post_loop() && !cl_next->range_checks_present()) {
After this fix, there's no crash but C2 still generates incorrect result.
$java Foo
524216
$java -XX:+UnlockExperimentalVMOptions -XX:+PostLoopMultiversioning Foo
917462
public class Foo {
private static final int SIZE = 65536;
private static void bar(int[] a, int[] b, int[] c, int start, int limit) {
for (int i = start; i < limit; i += 1) {
c[i] = a[i] + b[i];
}
}
public static void main(String[] args) {
int[] a = new int[SIZE];
int[] b = new int[SIZE];
int[] c = new int[SIZE];
for (int i = 0; i < SIZE; i++) {
a[i] = i;
b[i] = i;
c[i] = 0;
}
for (int i = 0; i < 20000; i++) {
bar(a, b, c, 16384, 32768);
}
int sum = 0;
for (int i = 32760; i < 32780; i++) {
sum += c[i];
}
System.out.println(sum);
}
}
$java -XX:+UnlockExperimentalVMOptions -XX:+PostLoopMultiversioning Foo
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f04f181cb15, pid=20589, tid=20611
#
# JRE version: OpenJDK Runtime Environment (16.0) (slowdebug build 16-internal+0-adhoc..jdksrc)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 16-internal+0-adhoc..jdksrc, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x10dbb15] SuperWord::transform_loop(IdealLoopTree*, bool)+0x513
#
# Core dump will be written. Default location: /home/ent-user/case/core
#
# An error report file with more information is saved as:
# /home/ent-user/case/hs_err_pid20589.log
#
# Compiler replay data is saved as:
# /home/ent-user/case/replay_pid20589.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)
The SIGSEGV occurs at C2 code superword.cpp here (http://hg.openjdk.java.net/jdk/jdk/file/cc7b6598df7e/src/hotspot/share/opto/superword.cpp#l173).
Cause is that lpt_next could be null after loop strip mining. The code block checks which post loop (the normal vector post or the multi-versioned post) resides after the main loop. But if the main loop is strip-mined, the _next loop would be null. So to fix this crash we can check if it's strip-mined and search the _parent->_next loop if it is.
Patch:
diff --git a/src/hotspot/share/opto/superword.cpp b/src/hotspot/share/opto/superword.cpp
index 0f4da5e8cfa..caf59461164 100644
--- a/src/hotspot/share/opto/superword.cpp
+++ b/src/hotspot/share/opto/superword.cpp
@@ -169,7 +169,7 @@ void SuperWord::transform_loop(IdealLoopTree* lpt, bool do_optimization) {
SLP_extract();
if (PostLoopMultiversioning && Matcher::has_predicated_vectors()) {
if (cl->is_vectorized_loop() && cl->is_main_loop() && !cl->is_reduction_loop()) {
- IdealLoopTree *lpt_next = lpt->_next;
+ IdealLoopTree *lpt_next = cl->is_strip_mined() ? lpt->_parent->_next : lpt->_next;
CountedLoopNode *cl_next = lpt_next->_head->as_CountedLoop();
_phase->has_range_checks(lpt_next);
if (cl_next->is_post_loop() && !cl_next->range_checks_present()) {
After this fix, there's no crash but C2 still generates incorrect result.
$java Foo
524216
$java -XX:+UnlockExperimentalVMOptions -XX:+PostLoopMultiversioning Foo
917462
- duplicates
-
JDK-8183390 Fix and re-enable post loop vectorization
- Resolved
- relates to
-
JDK-8153998 Masked vector post loops
- Resolved
-
JDK-8186027 C2: loop strip mining
- Resolved
-
JDK-8211251 Default mask register for avx512 instructions
- Resolved