When larger number of classes are included for CDS archive, the archive dumping process becomes longer and more expensive. If CDS usages are integrated and archive dumping is part of the developing build processes, the dump operation would add additional wait time for developers in interactive builds. Longer dump time can discourage the usages of CDS.
I investigated using parallel operations for CDS dump process. On JDK 11, I measured the dump time by archiving ~70K classes on linux-x86 cloud build systems:
- Before: 70.95s
- With parallel operation enhancement: 58.53s
Proposal: Improve CDS creation using parallel operations.
Optimizations for faster CDS creation:
- Separate the CDS archive creation into parallel phase and non-parallel phase. Introduce DumpWithParallelism VM flag to specify the number of Java threads for the parallel phase. If DumpWithParallelism is 1, all CDS archiving operations are done using one thread as before.
Parallel phase
==============
The static CDSParallelPreProcessor.preLoadAndProcess() method is the entry point of the parallel phase.
During the parallel phase, the classlist is split into a number of sublists based on the DumpWithParallelism value and processed parallely in different threads. Classes on the list are loaded but not explicitly initialized. Loaded classes are linked and verified (when required).
CDSParallelPreProcessor.preLoadAndProcess() waits for all parallel tasks until they are completed, and transfers the control back to the VM, which then enters the non-parallel phase.
Non-parallel phase
==================
* Initialize classes with archived static fields.
* Iterate ClassLoaderDataGraph and link/verify any classes that are not linked. Verification may cause more classes being loaded.
* Collect archivable classes.
* Resolve constants.
* Rewrite nofast bytecode.
* Remove unsharable data.
* Copy class metadata.
* Copy Java heap objects.
* Write archive file.
Prototype:
* https://github.com/adoptium/jdk11u-fast-startup-incubator/pull/29
* https://github.com/adoptium/jdk11u-fast-startup-incubator/pull/32 (Implement 16-bit atomic cmpxchg (CAS) for linux-aarch64 port)
I investigated using parallel operations for CDS dump process. On JDK 11, I measured the dump time by archiving ~70K classes on linux-x86 cloud build systems:
- Before: 70.95s
- With parallel operation enhancement: 58.53s
Proposal: Improve CDS creation using parallel operations.
Optimizations for faster CDS creation:
- Separate the CDS archive creation into parallel phase and non-parallel phase. Introduce DumpWithParallelism VM flag to specify the number of Java threads for the parallel phase. If DumpWithParallelism is 1, all CDS archiving operations are done using one thread as before.
Parallel phase
==============
The static CDSParallelPreProcessor.preLoadAndProcess() method is the entry point of the parallel phase.
During the parallel phase, the classlist is split into a number of sublists based on the DumpWithParallelism value and processed parallely in different threads. Classes on the list are loaded but not explicitly initialized. Loaded classes are linked and verified (when required).
CDSParallelPreProcessor.preLoadAndProcess() waits for all parallel tasks until they are completed, and transfers the control back to the VM, which then enters the non-parallel phase.
Non-parallel phase
==================
* Initialize classes with archived static fields.
* Iterate ClassLoaderDataGraph and link/verify any classes that are not linked. Verification may cause more classes being loaded.
* Collect archivable classes.
* Resolve constants.
* Rewrite nofast bytecode.
* Remove unsharable data.
* Copy class metadata.
* Copy Java heap objects.
* Write archive file.
Prototype:
* https://github.com/adoptium/jdk11u-fast-startup-incubator/pull/29
* https://github.com/adoptium/jdk11u-fast-startup-incubator/pull/32 (Implement 16-bit atomic cmpxchg (CAS) for linux-aarch64 port)