-
Enhancement
-
Resolution: Unresolved
-
P4
-
25
I tried to add an assert in eliminate_useless_multiversion_if that we must always find the multiversion_if from a multiversioned main loop. But there are cases where this can fail.
See PR description:
https://github.com/openjdk/jdk/pull/24183
Here an example:
test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java
With flags:
-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation
Counted Loop: N537/N176 counted [int,100),+1 (-1 iters)
Loop: N0/N0 has_sfpt
Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 }
Loop: N536/N535
Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined
Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt
Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt
Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined
PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined
Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined
Poor node estimate: 306 >> 92
Loop: N0/N0 has_sfpt
Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 }
Loop: N556/N557 sfpts={ 559 }
Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined
Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt
Loop: N536/N535 sfpts={ 538 }
Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined
Loop: N575/N577 counted [int,100),+1 (4 iters) post multiversion_fast has_sfpt
Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt
Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt
Parallel IV: 643 Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined
Parallel IV: 646 Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt
Parallel IV: 652 Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined
Parallel IV: 649 Loop: N575/N577 counted [int,100),+1 (4 iters) post multiversion_fast has_sfpt
Loop: N0/N0 has_sfpt
Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 }
Loop: N556/N557 sfpts={ 559 }
Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined
Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt
Loop: N536/N535 sfpts={ 538 }
Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined
Loop: N575/N577 counted [int,100),+1 (4 iters) post multiversion_fast has_sfpt
Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt
Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt
Empty without zero trip guard Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined
Peel Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined
Empty without zero trip guard Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt
Peel Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt
Unroll 4 Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined
It seems that we are able to detect some loops as empty loops, including the pre-loop. But somhow the main-loop is not removed by "empty loop", and now this main-loop cannot traverse through the pre-loop to the multiversion_if.
See PR description:
https://github.com/openjdk/jdk/pull/24183
Here an example:
test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java
With flags:
-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation
Counted Loop: N537/N176 counted [int,100),+1 (-1 iters)
Loop: N0/N0 has_sfpt
Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 }
Loop: N536/N535
Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined
Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt
Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt
Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined
PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined
Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined
Poor node estimate: 306 >> 92
Loop: N0/N0 has_sfpt
Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 }
Loop: N556/N557 sfpts={ 559 }
Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined
Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt
Loop: N536/N535 sfpts={ 538 }
Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined
Loop: N575/N577 counted [int,100),+1 (4 iters) post multiversion_fast has_sfpt
Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt
Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt
Parallel IV: 643 Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined
Parallel IV: 646 Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt
Parallel IV: 652 Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined
Parallel IV: 649 Loop: N575/N577 counted [int,100),+1 (4 iters) post multiversion_fast has_sfpt
Loop: N0/N0 has_sfpt
Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 }
Loop: N556/N557 sfpts={ 559 }
Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined
Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt
Loop: N536/N535 sfpts={ 538 }
Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined
Loop: N575/N577 counted [int,100),+1 (4 iters) post multiversion_fast has_sfpt
Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt
Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt
Empty without zero trip guard Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined
Peel Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined
Empty without zero trip guard Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt
Peel Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt
Unroll 4 Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined
It seems that we are able to detect some loops as empty loops, including the pre-loop. But somhow the main-loop is not removed by "empty loop", and now this main-loop cannot traverse through the pre-loop to the multiversion_if.
- is blocked by
-
JDK-8352587 C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops
-
- Resolved
-