Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 16
Affects Version/s: 11
Component/s: hotspot
Labels:

Subcomponent:
compiler
Resolved In Build:
b27
CPU:

aarch64

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8257903	11.0.11	Volker Simonis	P4	Resolved	Fixed	b01

Submitted by Evgeny Astigeevich (eastig@amazon.co.uk)

When UseSIMDForMemoryOps is on on Graviton2, there are 27%-48% performance regressions of arraycopy microbenchmarks for 70-80 bytes copies. Analysis shows the problem code is generated in StubGenerator::copy_memory:

    if (UseSIMDForMemoryOps) {
      __ ld4(v0, v1, v2, v3, __ T16B, Address(s, 0));
      __ ldpq(v4, v5, Address(send, -32));
      __ st4(v0, v1, v2, v3, __ T16B, Address(d, 0));
      __ stpq(v4, v5, Address(dend, -32));
    } else {

Using ldpq/stpq instead of ld4/st4 fixes the regressions. This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them.

backported by

JDK-8257903 AArch64: Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory

Resolved

relates to

JDK-8257436 AArch64: Regressions in ArrayCopyUnalignedDst.testByte/testChar for 65-78 bytes when UseSIMDForMemoryOps is on

Resolved

JDK-8255351 Add detection for Graviton 2 CPUs

Resolved

links to

Commit openjdk/jdk/6e006223

Review openjdk/jdk/1293

Assignee:: Volker Simonis

Reporter:: Volker Simonis

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2020-11-17 09:23

Updated:: 2024-12-20 10:33

Resolved:: 2020-11-26 08:11

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates