-
Bug
-
Resolution: Duplicate
-
P2
-
21, 22
There are two performance regressions around float/double modulo:
1) The first one (this bug) is observed afterJDK-8302191 which changed SharedRuntime::frem/drem() for non-Windows x64 systems to no longer use the fmod() C-library implementation but instead use direct x86 assembly.
2) The second regression only affects JDK 22 and is observed after the intrinsification of float/double modulo withJDK-8308966. The details for that regression can be found separately in JDK-8312233.
The regression introduced byJDK-8302191 can be observed when running Blender2.java with a product/release VM (the machine used for the numbers below has AVX512 support - but the regression is also observed on AVX2 only machines):
Commit just beforeJDK-8302191 which is JDK-8304683 (https://github.com/openjdk/jdk/commit/760c0128a4ef787c8c8addb26894c072ba8b2eb1):
$ java Blender2.java
Output:
164 ms
161 ms
166 ms
164 ms
162 ms
160 ms
168 ms
163 ms
161 ms
167 ms
Average: 163 ms
Commit ofJDK-8302191 (https://github.com/openjdk/jdk/commit/37774556da8a5aacf55884133ae936ed5a28eab2):
$ java Blender2.java
Output:
255 ms
260 ms
256 ms
307 ms
257 ms
258 ms
265 ms
255 ms
260 ms
255 ms
Average: 262 ms
This suggests that the direct x86 assembly of SharedRuntime::frem/drem() is slower than the code executed by fmod(). We should have a closer look at the assembly produced by fmod() and improve our x86 assembly to fix the observed regressions.
---- Original Report ----
There is potentially a significant (~25%) performance regression on a micro’-ish’-benchmark in JDK 21 - see attachment.
The regression appears to be at least on Linux x86, but didn't appear on macOS x86. No other platforms were tried.
With Blender.java, the 2nd Java sample on the blog [1], a significant drop in performance between JDK 20.0.1 and JDK 21 using the latest binaries [2].
On the Ubuntu workstation,
JDK 20.0.1 runs Blender in 822ms
JDK 21 runs Blender in 1125ms
(see attachments for full source code and sample output)
[1] https://www.graalvm.org/22.1/examples/java-performance-examples/
[2] https://jdk.java.net/
1) The first one (this bug) is observed after
2) The second regression only affects JDK 22 and is observed after the intrinsification of float/double modulo with
The regression introduced by
Commit just before
$ java Blender2.java
Output:
164 ms
161 ms
166 ms
164 ms
162 ms
160 ms
168 ms
163 ms
161 ms
167 ms
Average: 163 ms
Commit of
$ java Blender2.java
Output:
255 ms
260 ms
256 ms
307 ms
257 ms
258 ms
265 ms
255 ms
260 ms
255 ms
Average: 262 ms
This suggests that the direct x86 assembly of SharedRuntime::frem/drem() is slower than the code executed by fmod(). We should have a closer look at the assembly produced by fmod() and improve our x86 assembly to fix the observed regressions.
---- Original Report ----
There is potentially a significant (~25%) performance regression on a micro’-ish’-benchmark in JDK 21 - see attachment.
The regression appears to be at least on Linux x86, but didn't appear on macOS x86. No other platforms were tried.
With Blender.java, the 2nd Java sample on the blog [1], a significant drop in performance between JDK 20.0.1 and JDK 21 using the latest binaries [2].
On the Ubuntu workstation,
JDK 20.0.1 runs Blender in 822ms
JDK 21 runs Blender in 1125ms
(see attachments for full source code and sample output)
[1] https://www.graalvm.org/22.1/examples/java-performance-examples/
[2] https://jdk.java.net/
- duplicates
-
JDK-8314056 Remove runtime platform check from frem/drem
- Resolved
- relates to
-
JDK-8314056 Remove runtime platform check from frem/drem
- Resolved
-
JDK-8302191 Performance degradation for float/double modulo on Linux
- Closed
-
JDK-8312233 Performance regression in SharedRuntime::frem/drem() on x86 with AVX2 after JDK-8308966
- Closed