-
Type:
Enhancement
-
Resolution: Unresolved
-
Priority:
P4
-
Affects Version/s: 26
-
Component/s: hotspot
These benchmarks have been contributed by Ioannis Tsakpinis:
UnsafeSetMemoryBench.java
https://gist.github.com/Spasi/ba6542d933fa81a7b1e19da59ae1db8e
UnsafeCopyMemoryBench.java
https://gist.github.com/Spasi/3b3f24591fb5ff63907050e721924b91
Here are accompanying notes:
* For Unsafe::setMemory, the problem seems to be that the 2/4/8-byte aligned loops added in JDK 23 are simply not sophisticated enough to compete with the byte fallback. The byte fallback already handles misalignment and uses vector instructions in the aligned portions (see MacroAssembler::generate_fill in macroAssembler_x86.cpp). The 2/4-byte implementations fall behind considerably, whereas the 8-byte implementation stays close.
* For Unsafe::copyMemory, I got curious about System.arraycopy and added corresponding benchmarks with different Java array types. I think their performance is a hint: while byte[] and short[] copies are fast, int[] and long[] copies are slow. Looking into the code, it seems like the 4/8-byte aligned implementations for Unsafe:copyMemory reuse the same code as int[], long[] and the same code is also used for Object[] (4-byte with compressed OOPs, 8-byte with uncompressed OOPs). Could it be that whatever overhead is associated with tracking reference copies affects the performance of plain data copies?
UnsafeSetMemoryBench.java
https://gist.github.com/Spasi/ba6542d933fa81a7b1e19da59ae1db8e
UnsafeCopyMemoryBench.java
https://gist.github.com/Spasi/3b3f24591fb5ff63907050e721924b91
Here are accompanying notes:
* For Unsafe::setMemory, the problem seems to be that the 2/4/8-byte aligned loops added in JDK 23 are simply not sophisticated enough to compete with the byte fallback. The byte fallback already handles misalignment and uses vector instructions in the aligned portions (see MacroAssembler::generate_fill in macroAssembler_x86.cpp). The 2/4-byte implementations fall behind considerably, whereas the 8-byte implementation stays close.
* For Unsafe::copyMemory, I got curious about System.arraycopy and added corresponding benchmarks with different Java array types. I think their performance is a hint: while byte[] and short[] copies are fast, int[] and long[] copies are slow. Looking into the code, it seems like the 4/8-byte aligned implementations for Unsafe:copyMemory reuse the same code as int[], long[] and the same code is also used for Object[] (4-byte with compressed OOPs, 8-byte with uncompressed OOPs). Could it be that whatever overhead is associated with tracking reference copies affects the performance of plain data copies?
- relates to
-
JDK-8333677 Improve Unsafe::setMemory intrinsics
-
- Open
-
-
JDK-8367158 C2: create better fill and copy benchmarks, taking alignment into account
-
- Open
-
-
JDK-8329331 Intrinsify Unsafe::setMemory
-
- Resolved
-