-
Bug
-
Resolution: Unresolved
-
P4
-
25
Solving JDK-8307513 with the PR https://github.com/openjdk/jdk/pull/20098 contains edge cases where performance degradation can be observed. These performance regressions can be summarised as follows:
Regression 1: Given a loop with a long min/max reduction pattern with one side of branch taken near 100% of time, when Supeword finds the pattern not profitable, then HotSpot will use scalar instructions (cmov) and performance will regress.
Regression 2: Given a loop with a long min/max reduction pattern with one side of branch near 100% of time, when the platform does not support vector instructions to achieve this (e.g. AVX-512 quad word vpmax/vpmin), then HotSpot will use scalar instructions (cmov) and performance will regress.
Regression 3: Given a loop with a long min/max non-reduction pattern (e.g. longLoopMax) with one side of branch taken near 100% of time, when the platform does not vectorize it (either lack of CPU instruction support, or Superword finding not profitable), then HotSpot will use scalar instructions (cmov) and performance will regress.
What all these regressions have in common is that in this extreme scenarios the compiler emits scalar cmov instructions. So, the idea to fix this would be to detect these extreme scenarios would be to use branching code (e.g. cmp + mov).
Regression 1: Given a loop with a long min/max reduction pattern with one side of branch taken near 100% of time, when Supeword finds the pattern not profitable, then HotSpot will use scalar instructions (cmov) and performance will regress.
Regression 2: Given a loop with a long min/max reduction pattern with one side of branch near 100% of time, when the platform does not support vector instructions to achieve this (e.g. AVX-512 quad word vpmax/vpmin), then HotSpot will use scalar instructions (cmov) and performance will regress.
Regression 3: Given a loop with a long min/max non-reduction pattern (e.g. longLoopMax) with one side of branch taken near 100% of time, when the platform does not vectorize it (either lack of CPU instruction support, or Superword finding not profitable), then HotSpot will use scalar instructions (cmov) and performance will regress.
What all these regressions have in common is that in this extreme scenarios the compiler emits scalar cmov instructions. So, the idea to fix this would be to detect these extreme scenarios would be to use branching code (e.g. cmp + mov).
- relates to
-
JDK-8307513 C2: intrinsify Math.max(long,long) and Math.min(long,long)
-
- Open
-