Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 24
Affects Version/s: 17, 21, 23, 24
Component/s: hotspot
Labels:

Subcomponent:
gc
Resolved In Build:
b09

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8340549	23.0.2	William Kemper	P4	Resolved	Fixed	b01
JDK-8344591	21.0.6	William Kemper	P4	Resolved	Fixed	b05

Shenandoah init mark is supposed to be very fast, on the order of a few hundreds microseconds. We do most of the work right in the VM thread that executes the safepoint. Yet, we have a block here that involves workers:
https://github.com/openjdk/jdk/blob/d41d2a7a82cb6eff17396717e2e14139ad8179ba/src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp#L555-L559

It goes for parallel walk when the number of regions is 1024 (see ShenandoahParallelRegionStride), which is below the usual Shenandoah target of 2048 regions. Which means we are likely always going into that path.

It might cause some trouble, if the number of parallel GC workers is high: we wake up lots of GC threads without having most them do any useful work:

[info ][gc,start ] GC(163) Pause Init Mark (unload classes)
[info ][gc,task ] GC(163) Using 16 of 16 workers for init marking
[info ][gc ] GC(163) Pause Init Mark (unload classes) 0.116ms
[info ][safepoint ] Safepoint "ShenandoahInitMark", Time since last: 10717617218 ns, Reaching safepoint: 157434 ns, Cleanup: 27282 ns, At safepoint: 202251 ns, Total: 386967 ns

We need to see if: a) this is actually a problem; b) default ShenandoahParallelRegionStride is too low; c) whether we should limit the number of active worker around that block by `num_regions() / stride`; d) whether we should just ditch this code and do a single-threaded walk always.

Not limited to init mark, parallel_heap_region_iterate is used by 4 others GC phases to apply lightweight operation on heap regions, if possible/needed, we should optimize parallel_heap_region_iterate which generally benefits all the 5 places using parallel_heap_region_iterate to walk and apply operation on heap regions.

Assuming the overhead to orchestrate worker threads for parallel interaction is `n`, the cost to process 1024 heap region is `m`(assuming total cost is linear in single thread), we could test and collect the value of `n` and `m` them calculate the threshold, below the threshold simply use single thread, otherwise use parallel walk. Threshold should be roughly `(n/m + 1) * 1024`

backported by

JDK-8340549 Shenandoah: Parallel worker use in parallel_heap_region_iterate

Resolved

JDK-8344591 Shenandoah: Parallel worker use in parallel_heap_region_iterate

Resolved

links to

Commit(master) openjdk/jdk21u-dev/fd7b6e45

Commit(master) openjdk/jdk23u/ac8e8da5

Commit(master) openjdk/jdk/e74edbae

Commit(master) openjdk/shenandoah-jdk21u/d7815100

Review(master) openjdk/jdk21u-dev/973

Review(master) openjdk/jdk23u/101

Review(master) openjdk/jdk/20305

Review(master) openjdk/shenandoah-jdk21u/108

(5 links to)

Assignee:: Xiaolong Peng

Reporter:: Aleksey Shipilev

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024-07-17 03:16

Updated:: 2024-11-19 15:30

Resolved:: 2024-07-25 09:07

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates