Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Unresolved
Priority: P4
Fix Version/s: tbd
Affects Version/s: 11, 17, 21
Component/s: hotspot
Labels:
- c2
- performance

Subcomponent:
compiler
CPU:

aarch64

https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L7795 uses a "prfm $mem, PSTL1Keep", which does a prefetch. So far, so good.

But that prefetch instruction will fetch the previous contents of the memory, which is about to be over-written by an upcoming allocation and initialization. That memory access is waste of memory bandwidth and latency. aarch64 has a DC ZVA instruction that does not drag a cache line in from memory; it just zeros the line in the cache, making it available for writing without any memory fetching. macroAssembler_aachr64 knows about DC ZVA because it uses to zero blocks of memory.

DC ZVA zeros a whole cache line, so maybe it can only be used with AllocatePrefetchStyle=3 to avoid zeroing previously initialized objects in the same cache line; or maybe with other AllocatePrefetchStyles if the distance ahead is more than one cache line.

Please consider using DC ZVA instead of prefetch in prefetchalloc(memory8).

Assignee:: Unassigned

Reporter:: Peter Kessler

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023-06-26 12:46

Updated:: 2023-06-26 23:39

Details

Description

Attachments

Activity

People

Dates