-
Enhancement
-
Resolution: Unresolved
-
P4
-
11, 17, 21
-
aarch64
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L7795 uses a "prfm $mem, PSTL1Keep", which does a prefetch. So far, so good.
But that prefetch instruction will fetch the previous contents of the memory, which is about to be over-written by an upcoming allocation and initialization. That memory access is waste of memory bandwidth and latency. aarch64 has a DC ZVA instruction that does not drag a cache line in from memory; it just zeros the line in the cache, making it available for writing without any memory fetching. macroAssembler_aachr64 knows about DC ZVA because it uses to zero blocks of memory.
DC ZVA zeros a whole cache line, so maybe it can only be used with AllocatePrefetchStyle=3 to avoid zeroing previously initialized objects in the same cache line; or maybe with other AllocatePrefetchStyles if the distance ahead is more than one cache line.
Please consider using DC ZVA instead of prefetch in prefetchalloc(memory8).
But that prefetch instruction will fetch the previous contents of the memory, which is about to be over-written by an upcoming allocation and initialization. That memory access is waste of memory bandwidth and latency. aarch64 has a DC ZVA instruction that does not drag a cache line in from memory; it just zeros the line in the cache, making it available for writing without any memory fetching. macroAssembler_aachr64 knows about DC ZVA because it uses to zero blocks of memory.
DC ZVA zeros a whole cache line, so maybe it can only be used with AllocatePrefetchStyle=3 to avoid zeroing previously initialized objects in the same cache line; or maybe with other AllocatePrefetchStyles if the distance ahead is more than one cache line.
Please consider using DC ZVA instead of prefetch in prefetchalloc(memory8).