-
Bug
-
Resolution: Fixed
-
P3
-
11, 17, 21, 22, 23, 24, 25
-
b06
------------------------------ Comments for PR ----------------------------
LoopLimitNode::Value tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with if (final_con == (jlong)final_int) { and do not constant fold in that case.
However, there was an assert that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP withJDK-8309266.
Why did we not hit this assert before?
LoopLimitNode needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that limit <= max_int-stride does not hold, and it would constant-fold away the loop, together with the LoopLimitNode.
But now we found a second case:
// - During PhaseIdealLoop::split_thru_phi, the LoopLimitNode floats possibly far above
// the loop and its predicates, and we might get constants on one side of the phi that
// would lead to overflows. Such a code path would never lead us to enter the loop
// because of the loop limit overflow check that happens after the LoopLimitNode
// computation with overflow, but before we enter the loop, seeJDK-8335747.
In PhaseIdealLoop::split_thru_phi, we temporarily split the LoopLimitNode through the phi, generating a new LoopLimitNode for each branch of the phi. We then call Value on it to see if that leads us to constant fold one of the branches, which would be considered a "win".
In the regression test, we have this example:
We generate a temporary clone of LoopLimitNode(init=0, limit=x, stride=4) (would not constant fold because of variable x = Phi(1000, 2147483647)), which happens to be LoopLimitNode(init=0, limit=2147483647, stride=4). We evaluate Value on this temporary clone, and hit the overflow case.
Why is it ok to just remove the assert and allow LoopLimitNode to overflow?
We still have the loop limit check, which checks that limit <= max_int-stride, and this means we would never enter the loop if we took the Phi branch that led to the overflow.
I could not just remove the assert, because in LoopLimitNode::Ideal we have this (strange?) check that does not optimize the LoopLimitNode if the inputs are constants:
if (in(Init)->is_Con() && in(Limit)->is_Con())
return nullptr; // Value
The assumption seems to be that we want Value to do the constant folding here - but of course we did not constant-fold because we had detected the overflow in Value. Not optimizing further here has a unfortunate consequence: on platforms that do not have LoopLimit implemented in the backend directly, we would have "lowered" the LoopLimit further down into ConvI2L, SubL, AddL, DivL, ConvL2I ... nodes. With this check, we do never lower it, and end up with a "bad AD", i.e. compilation bailout in product.
I think this check can reasonably be removed, because Value should be called before Ideal anyway, and so if we can constand fold because of constant inputs, we would have already done so.
Note, that the lowering is delayed until post_loop_opts_phase, but we never did record_for_post_loop_opts_igvn, and so it was not guaranteed that we actually ever processed the LoopLimitNode again, which would mean we got "bad AD" again, i.e. compilation bailout in product.
------------------------------ Original Comment ----------------------------
I spent quite some time extracting and reducing the JASM, a mention in the PR would be highly appreciated ;)
Affected:
JDK24, JDK23 and JDK22, but also JDK17 and JDK11, maybe further back.
How to reproduce on debug:
emanuel@emanuel-oracle:xyz$ java -jar ~/Documents/asmtools-7.0-build/release/lib/asmtools.jar jasm X.jasm
emanuel@emanuel-oracle:xyz$ /oracle-work/jdk-21.0.3/fastdebug/bin/java -Xbatch -XX:+UseG1GC X
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (/opt/mach5/mesos/work_dir/slaves/0db9c48f-6638-40d0-9a4b-bd9cc7533eb8-S10182/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/da698345-968e-491e-9bad-55adcd4ebc49/runs/880b530a-e09f-4167-a8c2-1b2231c43cb8/workspace/open/src/hotspot/share/opto/loopnode.cpp:2498), pid=3731998, tid=3732012
# assert((in(Init)->is_ConI() && in(Limit)->is_ConI() && in(Stride)->is_ConI()) ? final_con == (jlong)final_int : true) failed: final value should be integer
#
# JRE version: Java(TM) SE Runtime Environment (21.0.3+6) (fastdebug build 21.0.3+6-LTS-151)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21.0.3+6-LTS-151, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x12844c0] LoopLimitNode::Value(PhaseGVN*) const+0x2f0
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /oracle-work/xyz/core.3731998)
#
# An error report file with more information is saved as:
# /oracle-work/xyz/hs_err_pid3731998.log
#
# Compiler replay data is saved as:
# /oracle-work/xyz/replay_pid3731998.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
Current CompileTask:
C2:1558 451 b 4 X::test (69 bytes)
Stack: [0x00007f1e10f56000,0x00007f1e11057000], sp=0x00007f1e11051d40, free space=1007k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x126f07d] LoopLimitNode::Value(PhaseGVN*) const+0x30d (loopnode.cpp:2540)
V [libjvm.so+0x1294ea7] PhaseIdealLoop::split_thru_phi(Node*, Node*, int)+0x267 (loopopts.cpp:105)
V [libjvm.so+0x1299998] PhaseIdealLoop::split_if_with_blocks_pre(Node*)+0x248 (loopopts.cpp:1190)
V [libjvm.so+0x129ea97] PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x1d7 (loopopts.cpp:1939)
V [libjvm.so+0x1290a89] PhaseIdealLoop::build_and_optimize()+0xee9 (loopnode.cpp:4815)
V [libjvm.so+0x9de3a0] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x390 (loopnode.hpp:1115)
V [libjvm.so+0x9d7005] Compile::optimize_loops(PhaseIterGVN&, LoopOptsMode)+0x75 (compile.cpp:2170)
V [libjvm.so+0x9d97f1] Compile::Optimize()+0xe21 (compile.cpp:2417)
V [libjvm.so+0x9dcf53] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b33 (compile.cpp:852)
V [libjvm.so+0x832065] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142)
V [libjvm.so+0x9e8c08] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x928 (compileBroker.cpp:2303)
V [libjvm.so+0x9e9898] CompileBroker::compiler_thread_loop()+0x478 (compileBroker.cpp:1961)
V [libjvm.so+0xe91e7c] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:759)
V [libjvm.so+0x17ab9b6] Thread::call_run()+0xb6 (thread.cpp:225)
V [libjvm.so+0x1495ed7] thread_native_entry(Thread*)+0x127 (os_linux.cpp:849)
With -Xint, it finishes very quickly. But product prints this:
Default case invoked for:
opcode = 204, "LoopLimit"
-> this is a little worrying
Debugging with rr:
rr /oracle-work/jdk-fork4/build/linux-x64-slowdebug/jdk/bin/java -Xbatch -XX:+UseG1GC TestWrongPredicateOrder
We have:
235 ConI === 0 [[ 531 460 312 391 426 609 ]] #int:4
33 ConI === 0 [[ 36 609 ]] #int:max-1
22 ConI === 0 [[ 25 431 74 63 74 63 52 52 283 312 397 609 ]] #int:0
609 LoopLimit === _ 22 33 235 [[ ]] !orig=312
So in the assert:
assert((in(Init)->is_ConI() && in(Limit)->is_ConI() && in(Stride)->is_ConI()) ? final_con == (jlong)final_int : true, "final value should be integer");
in(Init)->is_ConI()
-> true, int:0
in(Limit)->is_ConI()
-> true: #int:max-1
in(Stride)->is_ConI()
-> true: #int:4
final_con = 2147483648 = 0x8000'0000 = max_int+1
But:
(jlong)final_int = -2147483648 = min_int
How do we get here?
(rr) p stride_con
$25 = 4
(rr) p init_con
$18 = 0
(rr) p limit_con
$19 = 2147483646 = 0x7fff'fffe = max_int - 1
(rr) p stride_m
$20 = 3
(rr) p trip_count
$23 = 536870912 = 0x2000'0000
Computed as:
jlong trip_count = (limit_con - init_con + stride_m)/stride_con;
= ((max_int-1) - 0 + 3) / 4
(addition here would overflow int-range!)
Then, we compute:
jlong final_con = init_con + stride_con*trip_count;
= 0 + 4 * 0x2000'000
= 0x8000'0000
(and this overflows exactly to min_int when cast to int)
Update: the final_con pattern exists at least twice in the same file. Consider if both may be impacted.
LoopLimitNode::Value tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with if (final_con == (jlong)final_int) { and do not constant fold in that case.
However, there was an assert that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with
Why did we not hit this assert before?
LoopLimitNode needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that limit <= max_int-stride does not hold, and it would constant-fold away the loop, together with the LoopLimitNode.
But now we found a second case:
// - During PhaseIdealLoop::split_thru_phi, the LoopLimitNode floats possibly far above
// the loop and its predicates, and we might get constants on one side of the phi that
// would lead to overflows. Such a code path would never lead us to enter the loop
// because of the loop limit overflow check that happens after the LoopLimitNode
// computation with overflow, but before we enter the loop, see
In PhaseIdealLoop::split_thru_phi, we temporarily split the LoopLimitNode through the phi, generating a new LoopLimitNode for each branch of the phi. We then call Value on it to see if that leads us to constant fold one of the branches, which would be considered a "win".
In the regression test, we have this example:
We generate a temporary clone of LoopLimitNode(init=0, limit=x, stride=4) (would not constant fold because of variable x = Phi(1000, 2147483647)), which happens to be LoopLimitNode(init=0, limit=2147483647, stride=4). We evaluate Value on this temporary clone, and hit the overflow case.
Why is it ok to just remove the assert and allow LoopLimitNode to overflow?
We still have the loop limit check, which checks that limit <= max_int-stride, and this means we would never enter the loop if we took the Phi branch that led to the overflow.
I could not just remove the assert, because in LoopLimitNode::Ideal we have this (strange?) check that does not optimize the LoopLimitNode if the inputs are constants:
if (in(Init)->is_Con() && in(Limit)->is_Con())
return nullptr; // Value
The assumption seems to be that we want Value to do the constant folding here - but of course we did not constant-fold because we had detected the overflow in Value. Not optimizing further here has a unfortunate consequence: on platforms that do not have LoopLimit implemented in the backend directly, we would have "lowered" the LoopLimit further down into ConvI2L, SubL, AddL, DivL, ConvL2I ... nodes. With this check, we do never lower it, and end up with a "bad AD", i.e. compilation bailout in product.
I think this check can reasonably be removed, because Value should be called before Ideal anyway, and so if we can constand fold because of constant inputs, we would have already done so.
Note, that the lowering is delayed until post_loop_opts_phase, but we never did record_for_post_loop_opts_igvn, and so it was not guaranteed that we actually ever processed the LoopLimitNode again, which would mean we got "bad AD" again, i.e. compilation bailout in product.
------------------------------ Original Comment ----------------------------
I spent quite some time extracting and reducing the JASM, a mention in the PR would be highly appreciated ;)
Affected:
JDK24, JDK23 and JDK22, but also JDK17 and JDK11, maybe further back.
How to reproduce on debug:
emanuel@emanuel-oracle:xyz$ java -jar ~/Documents/asmtools-7.0-build/release/lib/asmtools.jar jasm X.jasm
emanuel@emanuel-oracle:xyz$ /oracle-work/jdk-21.0.3/fastdebug/bin/java -Xbatch -XX:+UseG1GC X
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (/opt/mach5/mesos/work_dir/slaves/0db9c48f-6638-40d0-9a4b-bd9cc7533eb8-S10182/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/da698345-968e-491e-9bad-55adcd4ebc49/runs/880b530a-e09f-4167-a8c2-1b2231c43cb8/workspace/open/src/hotspot/share/opto/loopnode.cpp:2498), pid=3731998, tid=3732012
# assert((in(Init)->is_ConI() && in(Limit)->is_ConI() && in(Stride)->is_ConI()) ? final_con == (jlong)final_int : true) failed: final value should be integer
#
# JRE version: Java(TM) SE Runtime Environment (21.0.3+6) (fastdebug build 21.0.3+6-LTS-151)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21.0.3+6-LTS-151, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x12844c0] LoopLimitNode::Value(PhaseGVN*) const+0x2f0
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /oracle-work/xyz/core.3731998)
#
# An error report file with more information is saved as:
# /oracle-work/xyz/hs_err_pid3731998.log
#
# Compiler replay data is saved as:
# /oracle-work/xyz/replay_pid3731998.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
Current CompileTask:
C2:1558 451 b 4 X::test (69 bytes)
Stack: [0x00007f1e10f56000,0x00007f1e11057000], sp=0x00007f1e11051d40, free space=1007k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x126f07d] LoopLimitNode::Value(PhaseGVN*) const+0x30d (loopnode.cpp:2540)
V [libjvm.so+0x1294ea7] PhaseIdealLoop::split_thru_phi(Node*, Node*, int)+0x267 (loopopts.cpp:105)
V [libjvm.so+0x1299998] PhaseIdealLoop::split_if_with_blocks_pre(Node*)+0x248 (loopopts.cpp:1190)
V [libjvm.so+0x129ea97] PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x1d7 (loopopts.cpp:1939)
V [libjvm.so+0x1290a89] PhaseIdealLoop::build_and_optimize()+0xee9 (loopnode.cpp:4815)
V [libjvm.so+0x9de3a0] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x390 (loopnode.hpp:1115)
V [libjvm.so+0x9d7005] Compile::optimize_loops(PhaseIterGVN&, LoopOptsMode)+0x75 (compile.cpp:2170)
V [libjvm.so+0x9d97f1] Compile::Optimize()+0xe21 (compile.cpp:2417)
V [libjvm.so+0x9dcf53] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b33 (compile.cpp:852)
V [libjvm.so+0x832065] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142)
V [libjvm.so+0x9e8c08] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x928 (compileBroker.cpp:2303)
V [libjvm.so+0x9e9898] CompileBroker::compiler_thread_loop()+0x478 (compileBroker.cpp:1961)
V [libjvm.so+0xe91e7c] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:759)
V [libjvm.so+0x17ab9b6] Thread::call_run()+0xb6 (thread.cpp:225)
V [libjvm.so+0x1495ed7] thread_native_entry(Thread*)+0x127 (os_linux.cpp:849)
With -Xint, it finishes very quickly. But product prints this:
Default case invoked for:
opcode = 204, "LoopLimit"
-> this is a little worrying
Debugging with rr:
rr /oracle-work/jdk-fork4/build/linux-x64-slowdebug/jdk/bin/java -Xbatch -XX:+UseG1GC TestWrongPredicateOrder
We have:
235 ConI === 0 [[ 531 460 312 391 426 609 ]] #int:4
33 ConI === 0 [[ 36 609 ]] #int:max-1
22 ConI === 0 [[ 25 431 74 63 74 63 52 52 283 312 397 609 ]] #int:0
609 LoopLimit === _ 22 33 235 [[ ]] !orig=312
So in the assert:
assert((in(Init)->is_ConI() && in(Limit)->is_ConI() && in(Stride)->is_ConI()) ? final_con == (jlong)final_int : true, "final value should be integer");
in(Init)->is_ConI()
-> true, int:0
in(Limit)->is_ConI()
-> true: #int:max-1
in(Stride)->is_ConI()
-> true: #int:4
final_con = 2147483648 = 0x8000'0000 = max_int+1
But:
(jlong)final_int = -2147483648 = min_int
How do we get here?
(rr) p stride_con
$25 = 4
(rr) p init_con
$18 = 0
(rr) p limit_con
$19 = 2147483646 = 0x7fff'fffe = max_int - 1
(rr) p stride_m
$20 = 3
(rr) p trip_count
$23 = 536870912 = 0x2000'0000
Computed as:
jlong trip_count = (limit_con - init_con + stride_m)/stride_con;
= ((max_int-1) - 0 + 3) / 4
(addition here would overflow int-range!)
Then, we compute:
jlong final_con = init_con + stride_con*trip_count;
= 0 + 4 * 0x2000'000
= 0x8000'0000
(and this overflows exactly to min_int when cast to int)
Update: the final_con pattern exists at least twice in the same file. Consider if both may be impacted.
- relates to
-
JDK-8347701 C2: investigate code duplication in LoopLimitNode::Value and Ideal
-
- Open
-
- links to
-
Commit(master) openjdk/jdk/f0af830f
-
Review(master) openjdk/jdk/23024