Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Unresolved
Priority: P4
Fix Version/s: tbd
Affects Version/s: 26
Component/s: hotspot
Labels:

Subcomponent:
compiler

In https://bugs.openjdk.org/browse/JDK-6912521, an optimization was added to inline individual load/stores for small arraycopy invocations. These loads and stores are 1 byte each. We could use word-sized load/stores where appropriate and increase the ArrayCopyLoadStoreMaxElem commensurately.
E.g. a 16-byte (aligned) arraycopy could be two inlined loads/stores, instead of calling the runtime stub.

A naive implementation gives me 5% improvement in SPECjvm crypto.signverify which does a lot of 16-byte copies. UseAVX=3 is faster due to https://bugs.openjdk.org/browse/JDK-8252848.

crypto.signverify on Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz:
UseAVX=3: 209.6 op/s (uses inlined vector masks from ~~JDK-8252848~~)
UseAVX=2: 198.4 op/s (calls arraycopy runtime stub)
UseAVX=2 ArrayCopyLoadStoreMaxElem=256: 203.3 op/s
UseAVX=2 (with 8-byte inlined load/stores): 209.3 op/s

relates to

JDK-6912521 System.arraycopy works slower than the simple loop for little lengths

Resolved

Assignee:: Unassigned
Reporter:: Oli Gillespie
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: 2025-08-19 03:37
Updated:: 2025-08-19 22:27

Details

Description

Attachments

Issue Links

Activity

People

Dates