-
Enhancement
-
Resolution: Fixed
-
P4
-
17, 21, 23, 24
-
b09
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8340549 | 23.0.2 | William Kemper | P4 | Resolved | Fixed | b01 |
JDK-8344591 | 21.0.6 | William Kemper | P4 | Resolved | Fixed | b05 |
https://github.com/openjdk/jdk/blob/d41d2a7a82cb6eff17396717e2e14139ad8179ba/src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp#L555-L559
It goes for parallel walk when the number of regions is 1024 (see ShenandoahParallelRegionStride), which is below the usual Shenandoah target of 2048 regions. Which means we are likely always going into that path.
It might cause some trouble, if the number of parallel GC workers is high: we wake up lots of GC threads without having most them do any useful work:
[info ][gc,start ] GC(163) Pause Init Mark (unload classes)
[info ][gc,task ] GC(163) Using 16 of 16 workers for init marking
[info ][gc ] GC(163) Pause Init Mark (unload classes) 0.116ms
[info ][safepoint ] Safepoint "ShenandoahInitMark", Time since last: 10717617218 ns, Reaching safepoint: 157434 ns, Cleanup: 27282 ns, At safepoint: 202251 ns, Total: 386967 ns
We need to see if: a) this is actually a problem; b) default ShenandoahParallelRegionStride is too low; c) whether we should limit the number of active worker around that block by `num_regions() / stride`; d) whether we should just ditch this code and do a single-threaded walk always.
Not limited to init mark, parallel_heap_region_iterate is used by 4 others GC phases to apply lightweight operation on heap regions, if possible/needed, we should optimize parallel_heap_region_iterate which generally benefits all the 5 places using parallel_heap_region_iterate to walk and apply operation on heap regions.
Assuming the overhead to orchestrate worker threads for parallel interaction is `n`, the cost to process 1024 heap region is `m`(assuming total cost is linear in single thread), we could test and collect the value of `n` and `m` them calculate the threshold, below the threshold simply use single thread, otherwise use parallel walk. Threshold should be roughly `(n/m + 1) * 1024`
- backported by
-
JDK-8340549 Shenandoah: Parallel worker use in parallel_heap_region_iterate
-
- Resolved
-
-
JDK-8344591 Shenandoah: Parallel worker use in parallel_heap_region_iterate
-
- Resolved
-
- links to
-
Commit(master) openjdk/jdk21u-dev/fd7b6e45
-
Commit(master) openjdk/jdk23u/ac8e8da5
-
Commit(master) openjdk/jdk/e74edbae
-
Commit(master) openjdk/shenandoah-jdk21u/d7815100
-
Review(master) openjdk/jdk21u-dev/973
-
Review(master) openjdk/jdk23u/101
-
Review(master) openjdk/jdk/20305
-
Review(master) openjdk/shenandoah-jdk21u/108