In Leyden performance investigations, we have figured that ArchiveRelocationMode=0 is saving 5..7 ms on HelloWorld startup. Mainline defaults to ARM=1, _losing_ as much. ARM=0 was switched to ARM=1 with JDK-8294323, which was delivered to JDK 17+ in in Apr 2023.
Profiling shows we spend time mem-faulting the memory loading the RO/RW regions, about 15 MB total. 15 MB in 5ms amounts to >4GB/sec, close to the single-threaded limits. I suspect the impact is larger if we relocate larger Metaspace, e.g. after dumping a CDS archive from a large application.
There is little we can do to make the actual relocation part faster: the overwhelming majority of samples is on kernel side. There are two opportunities here: a) parallel pretouch for mmap-ed regions, which wires up memory upfront, handling memfaults in parallel; b) parallel relocation, which balances the storm of memory writes for relocations themselves.
My early experiments show we can recover almost all the cost with just a handful of threads: we can manage to cut 5..7ms to just 0.5 ms with only pretouch, and we go under the noise floor with both pretouch and parallel relocation.
Draft: https://github.com/openjdk/jdk/compare/master...shipilev:jdk:JDK-8341334-cds-parallel-relocation
Profiling shows we spend time mem-faulting the memory loading the RO/RW regions, about 15 MB total. 15 MB in 5ms amounts to >4GB/sec, close to the single-threaded limits. I suspect the impact is larger if we relocate larger Metaspace, e.g. after dumping a CDS archive from a large application.
There is little we can do to make the actual relocation part faster: the overwhelming majority of samples is on kernel side. There are two opportunities here: a) parallel pretouch for mmap-ed regions, which wires up memory upfront, handling memfaults in parallel; b) parallel relocation, which balances the storm of memory writes for relocations themselves.
My early experiments show we can recover almost all the cost with just a handful of threads: we can manage to cut 5..7ms to just 0.5 ms with only pretouch, and we go under the noise floor with both pretouch and parallel relocation.
Draft: https://github.com/openjdk/jdk/compare/master...shipilev:jdk:JDK-8341334-cds-parallel-relocation
- relates to
-
JDK-8340474 [premain] Revert ArchiveRelocationMode back to 1
- Open
-
JDK-8344583 Make ArchiveWorkers lifecycle robust
- New
- links to
-
Commit(master) openjdk/jdk/76a55c3c
-
Review(master) openjdk/jdk/21302