-
Enhancement
-
Resolution: Unresolved
-
P4
-
8
-
Linux, x86_64
See the original discussion at:
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-September/008196.html
In short, when laying out the Object[], ParallelGC pushes the array elements on stack during the traversal, and pops them back as it actually process them. This results in the reverse memory layout for referenced elements.
This issue should be fixed because:
a) Depending on HW, you may or may not have the same performance walking back the memory; in particular, think about the non-x86 embedded scenarios where you don't have the luxury of advanced memory prefetchers;
b) Even if you *do* have the good memory prefetcher ready at your disposal, accessing the first element will entail two memory accesses,
because the first element is rather far off the base; keeping the first element closer to base may have the effect of having the first element
right there on the same cache line;
c) The Parallel GC layout is inconsistent with the layouts other GCs produce; which can have the surprising performance differences vs other collectors; I don't like surprising behaviors, and think we should minimize them where possible.
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-September/008196.html
In short, when laying out the Object[], ParallelGC pushes the array elements on stack during the traversal, and pops them back as it actually process them. This results in the reverse memory layout for referenced elements.
This issue should be fixed because:
a) Depending on HW, you may or may not have the same performance walking back the memory; in particular, think about the non-x86 embedded scenarios where you don't have the luxury of advanced memory prefetchers;
b) Even if you *do* have the good memory prefetcher ready at your disposal, accessing the first element will entail two memory accesses,
because the first element is rather far off the base; keeping the first element closer to base may have the effect of having the first element
right there on the same cache line;
c) The Parallel GC layout is inconsistent with the layouts other GCs produce; which can have the surprising performance differences vs other collectors; I don't like surprising behaviors, and think we should minimize them where possible.
- relates to
-
JDK-6459077 Revisit object array scanning in the parallel scavenge GC
-
- Open
-