-
Bug
-
Resolution: Unresolved
-
P5
-
None
-
repo-shenandoah
-
generic
-
generic
The attached log file (auto-tenure.out) and summary (gc-log.notes) reveal several issues in the existing GenShen implementation.
1. The log message emitted following a concurrent young GC cycle does not properly identify the cycle as a bootstrap cycle. See lines 4438-4440, reporting on GC(26). These should say "At end of Concurrent Bootstrap GC".
2. The log message emitted following a degenerated cycle sometimes incorrectly identifies the degenerated cycle as being a bootstrap cycle. See lines 6110-6112. These should say "At end of Degenerated GC".
3. It appears that we do not promptly trigger a GC when instantaneous allocation rate so motivates. See line 3390, Trigger for YOUNG GC(21) that occurs 33.777 seconds following completion of GC(20). During this time, the simulation workload has been idle. The simulation abruptly begins at AbsoluteTime: 1m 1s 2ms 4us 23ns, which occurs sometimes after timestamp 103.718s. The two clocks are not calibrated, so we do not know exactly when the simulation begins. What we know is that trigger for GC(21) arrives too late and we degenerate. We also know that the simulation workload is allocating approximately 2.1 GB/s. It may be that ShenandoahRegulatorThread::regulator_sleep() is taking too long. (The default ShenandoahControlIntervalMax is 10ms, so we should be ok, but something went wrong...)
4. We are sometimes very slow to respond to GC triggers. See line 6158, which triggers YOUNG GC, during concurrent execution of OLD GC(43). OLD GC(43) begins at time 179.370s (line 6156). The preempting YOUNG trigger occurs at time 179.372s (line 6158). It is not until time 180.866s (line 8018, 1.496s after the first trigger) that we announce our intent to cancel GC. Then, we we report (line 8020) that cancellation to 0.266s at time 181.132s. First problem is that it took 1.496s to recognize the trigger. A second problem is that it took 266 ms to cancel the OLD GC. Two things wrong here.
5. After we do a Full GC, we should reset the counter on consecutive degenerated GCs to zero. See lines 21_345 to 21_347. Full GC(235) ends with 28_584M available in YOUNG, 402M available in OLD. Concurrent YOUNG GC(236) starts just 8ms after Full GC ends so there has not been very much new garbage created (and there is no floating garbage from preceding STW Full GC). After 3.25s of concurrent execution, GC(236) is working on update-refs when it fails to allocate a TLAB (allocation rate is higher than normal because we are catching up with pent up demand following 7.2 second pause from just completed FULL GC). At this point, we should degenerate to finish update refs (in less than 2 s). This is highly preferred over immediately performing another FULL GC (in more than 7 s).
6. After we complete a FULL GC, we should reset our knowledge of "live at previous OLD marking". See Trigger (OLD) at line 22_964. This believes live at end of previous OLD marking is 1_439M. This is the same value this variable had at time 131.343, when we did our previous old-marking bootstrap for which old marking never finished. At time 731.094 (line 21_347), we finished a Full GC and found that live OLD memory is 16_675M. We should have updated "live at previous OLD marking" to this value here.
A separate effort will work to improve work balance between young GC and old GC efforts so that concurrent OLD marking can complete more quickly.
Addressing the 6 issues above will allow this workload to run much better. In particular, during the 20 minute simulation run, the attached log shows that we experienced 68 degenerated GCs and 8 Full GCs. Interestingly, all of this was caused by our very late trigger for GC(22). If we had avoided this very late trigger and the late trigger for GC(418), we would likely have experienced 0 degenerated GCs and 0 Full GCs.
As evidence, observe that the last Full GC(249) ends at time 801.76s and the last Degen GC(419) ends at time 911.066s. The simulation with the same allocate rate and live memory utilization runs until time 1300s (another 389s) without any further degenerations or full GCs.
To test this hypothesis, we ran the same load again with a GuaranteedYoungGCInterval of 5000 ms, and it ran without degen or full GC. That log is also attached.
1. The log message emitted following a concurrent young GC cycle does not properly identify the cycle as a bootstrap cycle. See lines 4438-4440, reporting on GC(26). These should say "At end of Concurrent Bootstrap GC".
2. The log message emitted following a degenerated cycle sometimes incorrectly identifies the degenerated cycle as being a bootstrap cycle. See lines 6110-6112. These should say "At end of Degenerated GC".
3. It appears that we do not promptly trigger a GC when instantaneous allocation rate so motivates. See line 3390, Trigger for YOUNG GC(21) that occurs 33.777 seconds following completion of GC(20). During this time, the simulation workload has been idle. The simulation abruptly begins at AbsoluteTime: 1m 1s 2ms 4us 23ns, which occurs sometimes after timestamp 103.718s. The two clocks are not calibrated, so we do not know exactly when the simulation begins. What we know is that trigger for GC(21) arrives too late and we degenerate. We also know that the simulation workload is allocating approximately 2.1 GB/s. It may be that ShenandoahRegulatorThread::regulator_sleep() is taking too long. (The default ShenandoahControlIntervalMax is 10ms, so we should be ok, but something went wrong...)
4. We are sometimes very slow to respond to GC triggers. See line 6158, which triggers YOUNG GC, during concurrent execution of OLD GC(43). OLD GC(43) begins at time 179.370s (line 6156). The preempting YOUNG trigger occurs at time 179.372s (line 6158). It is not until time 180.866s (line 8018, 1.496s after the first trigger) that we announce our intent to cancel GC. Then, we we report (line 8020) that cancellation to 0.266s at time 181.132s. First problem is that it took 1.496s to recognize the trigger. A second problem is that it took 266 ms to cancel the OLD GC. Two things wrong here.
5. After we do a Full GC, we should reset the counter on consecutive degenerated GCs to zero. See lines 21_345 to 21_347. Full GC(235) ends with 28_584M available in YOUNG, 402M available in OLD. Concurrent YOUNG GC(236) starts just 8ms after Full GC ends so there has not been very much new garbage created (and there is no floating garbage from preceding STW Full GC). After 3.25s of concurrent execution, GC(236) is working on update-refs when it fails to allocate a TLAB (allocation rate is higher than normal because we are catching up with pent up demand following 7.2 second pause from just completed FULL GC). At this point, we should degenerate to finish update refs (in less than 2 s). This is highly preferred over immediately performing another FULL GC (in more than 7 s).
6. After we complete a FULL GC, we should reset our knowledge of "live at previous OLD marking". See Trigger (OLD) at line 22_964. This believes live at end of previous OLD marking is 1_439M. This is the same value this variable had at time 131.343, when we did our previous old-marking bootstrap for which old marking never finished. At time 731.094 (line 21_347), we finished a Full GC and found that live OLD memory is 16_675M. We should have updated "live at previous OLD marking" to this value here.
A separate effort will work to improve work balance between young GC and old GC efforts so that concurrent OLD marking can complete more quickly.
Addressing the 6 issues above will allow this workload to run much better. In particular, during the 20 minute simulation run, the attached log shows that we experienced 68 degenerated GCs and 8 Full GCs. Interestingly, all of this was caused by our very late trigger for GC(22). If we had avoided this very late trigger and the late trigger for GC(418), we would likely have experienced 0 degenerated GCs and 0 Full GCs.
As evidence, observe that the last Full GC(249) ends at time 801.76s and the last Degen GC(419) ends at time 911.066s. The simulation with the same allocate rate and live memory utilization runs until time 1300s (another 389s) without any further degenerations or full GCs.
To test this hypothesis, we ran the same load again with a GuaranteedYoungGCInterval of 5000 ms, and it ran without degen or full GC. That log is also attached.