Following benchmarking kernel shows around 20% performance drop with latest JDK-25 build 25-ea+19-2255) vs JDK-17 build 17.0.9+11-LTS-201 due to split stores.
https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/VectorLoadToStoreForwarding.java#L197
Command line: perf stat -e cycles,instructions,mem_inst_retired.all_stores,mem_inst_retired.split_stores java -jar target/benchmarks.jar -f 1 -i 2 -wi 1 -w 30 org.openjdk.bench.vm.compiler.VectorLoadToStoreForwarding.VectorLoadToStoreForwardingSuperWord.benchmark_20
JDK-25 PMU events
================
92,58,13,16,800 cycles
28,45,27,41,807 instructions # 0.31 insn per cycle
9,58,42,45,086 mem_inst_retired.all_stores
4,49,51,55,071 mem_inst_retired.split_stores
32.510948769 seconds time elapsed
33.010587000 seconds user
0.194167000 seconds sys
System: Model name: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (CascadeLake Server)
https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/VectorLoadToStoreForwarding.java#L197
Command line: perf stat -e cycles,instructions,mem_inst_retired.all_stores,mem_inst_retired.split_stores java -jar target/benchmarks.jar -f 1 -i 2 -wi 1 -w 30 org.openjdk.bench.vm.compiler.VectorLoadToStoreForwarding.VectorLoadToStoreForwardingSuperWord.benchmark_20
JDK-25 PMU events
================
92,58,13,16,800 cycles
28,45,27,41,807 instructions # 0.31 insn per cycle
9,58,42,45,086 mem_inst_retired.all_stores
4,49,51,55,071 mem_inst_retired.split_stores
32.510948769 seconds time elapsed
33.010587000 seconds user
0.194167000 seconds sys
System: Model name: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (CascadeLake Server)
- relates to
-
JDK-8334431 C2 SuperWord: fix performance regression due to store-to-load-forwarding failures
-
- Closed
-
-
JDK-8325155 C2 SuperWord: remove alignment boundaries
-
- Resolved
-