-
Enhancement
-
Resolution: Unresolved
-
P4
-
None
-
25, 26, repo-shenandoah-21
The GenShen heuristics probe available() as part of testing whether it is time to trigger the next young-gen GC. We recently uncovered a scenario under which the ShenandoahRegulatorThread invokes should_start_gc(), which invokes ShenandoahFreeSet::available() during an old-generation final mark safepoint. This safepoint includes code to rebuild the freeset as part of adjusting the OldCollector reserves in order to assure room for a subsequent mixed-evacuation cycle.
We need to prevent the regulator thread from probing available memory while the free set is being rebuilt, because the variables used to represent available memory may be in an incomplete and incoherent state during execution of this safepoint code.
Here is the relevant GC log excerpt:
```
[2025-09-25T11:51:08.924+0000][116.699s][99296 ][info ][gc,ergo ] GC(21) At end of Concurrent Bootstrap GC: GCU: 20.1%, MU: 37.4% during period of 0.770s
[2025-09-25T11:51:08.924+0000][116.699s][99296 ][info ][gc,ergo ] GC(21) At end of Concurrent Bootstrap GC: Young generation used: 6645M, used regions: 6688M, humongous waste: 0B, soft capacity: 65536M, max capacity: 51584M, available: 42346M
[2025-09-25T11:51:08.924+0000][116.699s][99296 ][info ][gc,ergo ] GC(21) At end of Concurrent Bootstrap GC: Old generation used: 13854M, used regions: 13856M, humongous waste: 0B, soft capacity: 65536M, max capacity: 13952M, available: 99827K
;; Note that young has 42_346M available here
...
[2025-09-25T11:51:08.925+0000][116.700s][99296 ][debug][gc,thread ] Old generation transition from Bootstrapping to Marking
[2025-09-25T11:51:08.925+0000][116.700s][99296 ][info ][gc,start ] GC(21) Concurrent marking (Old)
[2025-09-25T11:51:08.925+0000][116.700s][99296 ][info ][gc,task ] GC(21) Using 32 of 64 workers for concurrent marking
[2025-09-25T11:51:09.924+0000][117.700s][99300 ][info ][safepoint,cleanup] updating inline caches, 0.0000104 secs
[2025-09-25T11:51:09.924+0000][117.700s][99300 ][info ][safepoint,cleanup] safepoint cleanup tasks, 0.0000386 secs
[2025-09-25T11:51:09.924+0000][117.700s][99300 ][info ][safepoint,stats ] Cleanup [ 235 6 ][ 337129 41952 133406 512487 ] 49
[2025-09-25T11:51:09.924+0000][117.700s][99300 ][info ][safepoint ] Safepoint "Cleanup", Time since last: 1000086107 ns, Reaching safepoint: 337129 ns, Cleanup: 41952 ns, At safepoint: 1534 ns, Leaving safepoint: 131872 ns, Total: 512487 ns
[2025-09-25T11:51:10.139+0000][117.914s][99300 ][info ][handshake ] Handshake "Shenandoah Flush SATB", Targeted threads: 235, Executed by requesting thread: 156, Total completion time: 2900218 ns
[2025-09-25T11:51:10.140+0000][117.915s][99300 ][info ][handshake ] Handshake "Shenandoah Flush SATB", Targeted threads: 235, Executed by requesting thread: 165, Total completion time: 157204 ns
[2025-09-25T11:51:10.140+0000][117.915s][99296 ][info ][gc ] GC(21) Concurrent marking (Old) 1215.063ms
[2025-09-25T11:51:10.140+0000][117.915s][99300 ][info ][safepoint,cleanup] updating inline caches, 0.0000001 secs
[2025-09-25T11:51:10.140+0000][117.915s][99300 ][info ][safepoint,cleanup] safepoint cleanup tasks, 0.0000072 secs
;; This is where we enter into the final mark (old) safepoint
[2025-09-25T11:51:10.140+0000][117.915s][99300 ][info ][gc,start ] GC(21) Pause Final Mark (Old)
[2025-09-25T11:51:10.140+0000][117.915s][99300 ][info ][gc,task ] GC(21) Using 64 of 64 workers for final marking
[2025-09-25T11:51:10.140+0000][117.916s][99300 ][info ][gc,ergo ] GC(21) Old-Gen Collectable Garbage: 498M consolidated with free: 0B, over 33 regions
[2025-09-25T11:51:10.140+0000][117.916s][99300 ][info ][gc,ergo ] GC(21) Old-Gen Immediate Garbage: 0B over 0 regions
[2025-09-25T11:51:10.140+0000][117.916s][99300 ][info ][gc,ergo ] GC(21) Old regions selected for defragmentation: 5
[2025-09-25T11:51:10.140+0000][117.916s][99300 ][info ][gc,ergo ] GC(21) Old regions not selected: 400
[2025-09-25T11:51:10.140+0000][117.916s][99300 ][debug][gc,thread ] Old generation transition from Marking to Evacuating
;; This is where the regulator thread invokes should_start_gc() and mistakenly concludes that available is 0, so triggers an immediate GC
[2025-09-25T11:51:10.141+0000][117.916s][99297 ][info ][gc ] Trigger (Young): Free (0B) is below minimum threshold (6553M)
[2025-09-25T11:51:10.141+0000][117.916s][99297 ][debug][gc,thread ] Cannot start young, old collection is not preemptible
;; The above log message confirms that we are in an old-gen safepoint
[2025-09-25T11:51:10.141+0000][117.916s][99300 ][info ][gc,ergo ] GC(21) Transfer 1 region(s) from Young to Old, yielding increased size: 13984M
[2025-09-25T11:51:10.141+0000][117.916s][99300 ][info ][gc,free ] Free: 31328M, Max: 32768K regular, 31328M humongous, Frag: 0% external, 0% internal; Used: 0B, Mutator Free: 979 Collector Reserve: 2579M, Max: 32768K; Used: 45456K Old Collector Reserve: 129M, Max: 32768K; Used: 31245K
;; Above, we have finished rebuilding the freeset and report that there is 31_328M available in young
[2025-09-25T11:51:10.141+0000][117.916s][99300 ][info ][gc ] GC(21) Pause Final Mark (Old) 0.668ms
```
This bug does not affect traditional single-generation Shenandoah.
We need to prevent the regulator thread from probing available memory while the free set is being rebuilt, because the variables used to represent available memory may be in an incomplete and incoherent state during execution of this safepoint code.
Here is the relevant GC log excerpt:
```
[2025-09-25T11:51:08.924+0000][116.699s][99296 ][info ][gc,ergo ] GC(21) At end of Concurrent Bootstrap GC: GCU: 20.1%, MU: 37.4% during period of 0.770s
[2025-09-25T11:51:08.924+0000][116.699s][99296 ][info ][gc,ergo ] GC(21) At end of Concurrent Bootstrap GC: Young generation used: 6645M, used regions: 6688M, humongous waste: 0B, soft capacity: 65536M, max capacity: 51584M, available: 42346M
[2025-09-25T11:51:08.924+0000][116.699s][99296 ][info ][gc,ergo ] GC(21) At end of Concurrent Bootstrap GC: Old generation used: 13854M, used regions: 13856M, humongous waste: 0B, soft capacity: 65536M, max capacity: 13952M, available: 99827K
;; Note that young has 42_346M available here
...
[2025-09-25T11:51:08.925+0000][116.700s][99296 ][debug][gc,thread ] Old generation transition from Bootstrapping to Marking
[2025-09-25T11:51:08.925+0000][116.700s][99296 ][info ][gc,start ] GC(21) Concurrent marking (Old)
[2025-09-25T11:51:08.925+0000][116.700s][99296 ][info ][gc,task ] GC(21) Using 32 of 64 workers for concurrent marking
[2025-09-25T11:51:09.924+0000][117.700s][99300 ][info ][safepoint,cleanup] updating inline caches, 0.0000104 secs
[2025-09-25T11:51:09.924+0000][117.700s][99300 ][info ][safepoint,cleanup] safepoint cleanup tasks, 0.0000386 secs
[2025-09-25T11:51:09.924+0000][117.700s][99300 ][info ][safepoint,stats ] Cleanup [ 235 6 ][ 337129 41952 133406 512487 ] 49
[2025-09-25T11:51:09.924+0000][117.700s][99300 ][info ][safepoint ] Safepoint "Cleanup", Time since last: 1000086107 ns, Reaching safepoint: 337129 ns, Cleanup: 41952 ns, At safepoint: 1534 ns, Leaving safepoint: 131872 ns, Total: 512487 ns
[2025-09-25T11:51:10.139+0000][117.914s][99300 ][info ][handshake ] Handshake "Shenandoah Flush SATB", Targeted threads: 235, Executed by requesting thread: 156, Total completion time: 2900218 ns
[2025-09-25T11:51:10.140+0000][117.915s][99300 ][info ][handshake ] Handshake "Shenandoah Flush SATB", Targeted threads: 235, Executed by requesting thread: 165, Total completion time: 157204 ns
[2025-09-25T11:51:10.140+0000][117.915s][99296 ][info ][gc ] GC(21) Concurrent marking (Old) 1215.063ms
[2025-09-25T11:51:10.140+0000][117.915s][99300 ][info ][safepoint,cleanup] updating inline caches, 0.0000001 secs
[2025-09-25T11:51:10.140+0000][117.915s][99300 ][info ][safepoint,cleanup] safepoint cleanup tasks, 0.0000072 secs
;; This is where we enter into the final mark (old) safepoint
[2025-09-25T11:51:10.140+0000][117.915s][99300 ][info ][gc,start ] GC(21) Pause Final Mark (Old)
[2025-09-25T11:51:10.140+0000][117.915s][99300 ][info ][gc,task ] GC(21) Using 64 of 64 workers for final marking
[2025-09-25T11:51:10.140+0000][117.916s][99300 ][info ][gc,ergo ] GC(21) Old-Gen Collectable Garbage: 498M consolidated with free: 0B, over 33 regions
[2025-09-25T11:51:10.140+0000][117.916s][99300 ][info ][gc,ergo ] GC(21) Old-Gen Immediate Garbage: 0B over 0 regions
[2025-09-25T11:51:10.140+0000][117.916s][99300 ][info ][gc,ergo ] GC(21) Old regions selected for defragmentation: 5
[2025-09-25T11:51:10.140+0000][117.916s][99300 ][info ][gc,ergo ] GC(21) Old regions not selected: 400
[2025-09-25T11:51:10.140+0000][117.916s][99300 ][debug][gc,thread ] Old generation transition from Marking to Evacuating
;; This is where the regulator thread invokes should_start_gc() and mistakenly concludes that available is 0, so triggers an immediate GC
[2025-09-25T11:51:10.141+0000][117.916s][99297 ][info ][gc ] Trigger (Young): Free (0B) is below minimum threshold (6553M)
[2025-09-25T11:51:10.141+0000][117.916s][99297 ][debug][gc,thread ] Cannot start young, old collection is not preemptible
;; The above log message confirms that we are in an old-gen safepoint
[2025-09-25T11:51:10.141+0000][117.916s][99300 ][info ][gc,ergo ] GC(21) Transfer 1 region(s) from Young to Old, yielding increased size: 13984M
[2025-09-25T11:51:10.141+0000][117.916s][99300 ][info ][gc,free ] Free: 31328M, Max: 32768K regular, 31328M humongous, Frag: 0% external, 0% internal; Used: 0B, Mutator Free: 979 Collector Reserve: 2579M, Max: 32768K; Used: 45456K Old Collector Reserve: 129M, Max: 32768K; Used: 31245K
;; Above, we have finished rebuilding the freeset and report that there is 31_328M available in young
[2025-09-25T11:51:10.141+0000][117.916s][99300 ][info ][gc ] GC(21) Pause Final Mark (Old) 0.668ms
```
This bug does not affect traditional single-generation Shenandoah.