-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
P4
-
None
-
Affects Version/s: 17, 21, 25, 26
-
Component/s: hotspot
-
Fix Understood
Shenandoah uses max heap size to calculate reserved evac size[1], and the way how we calculate soft tail[2] for mutator available size includes the reserved size for evac. The current logic to decide whether to trigger gc[3] is:
```
available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used
soft_tail = Xmx - soft_max
if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc
```
The condition `available - soft_tail` will be reduced to: -(ShenandoahEvacReserve/100) * Xmx - used + soft_max, which means when the soft max heap size is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense.
Our internal customer reported high gc activities with close to idle workload when soft max heap size was set way lower than Xmx. We identified the above wrong logic as the root cause. I repro-ed successfully using StableLiveSet.java (A java app that creates objects but maintains heap size at ~300M. Credits to chatgpt) in the attachment. With Xmx31g: `java '-Xlog:gc*=info' -XX:+UseShenandoahGC -XX:SoftMaxHeapSize=2g -Xmx31g StableLiveSet`, gc ran ~2217 times in 10 sec. After adjusting the Xmx down to 3g, gc only ran 4 times in 10 sec just for the initial learning cycles.
Edit: Generational shenandoah should_start_gc() available size calculation is also inaccurate, but it's way less impacted. GenShen uses (soft_available() - used) as the available. Although it should exclude space reserved for evac to avoid over calculating the available, it doesn't suffer busy gc under minimal load.
~~Generational shenandoah doesn't seem to be impacted. Traditional only.~~
---------
Suggested fix: when deciding when to trigger gc[3], use the logic similar to below:
```
mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100;
available = mutator_soft_capacity - used;
if (available < mutator_soft_capacity) // trigger gc
```
In the above logic, the `available < mutator_soft_capacity` will be reduced to `soft_max * (100 - ShenandoahEvacReserve) / 100 - used`, which does not relate to Xmx and is positively correlate to soft_max.
[1]: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp#L2629
[2]: https://github.com/openjdk/jdk/blob/0a3809f0be94c92c2c46f00fe5ff981afdd55cf0/src/hotspot/share/gc/shenandoah/shenandoahGlobalGeneration.cpp#L86
[3]: https://github.com/openjdk/jdk/blob/0a3809f0be94c92c2c46f00fe5ff981afdd55cf0/src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp#L235
```
available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used
soft_tail = Xmx - soft_max
if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc
```
The condition `available - soft_tail` will be reduced to: -(ShenandoahEvacReserve/100) * Xmx - used + soft_max, which means when the soft max heap size is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense.
Our internal customer reported high gc activities with close to idle workload when soft max heap size was set way lower than Xmx. We identified the above wrong logic as the root cause. I repro-ed successfully using StableLiveSet.java (A java app that creates objects but maintains heap size at ~300M. Credits to chatgpt) in the attachment. With Xmx31g: `java '-Xlog:gc*=info' -XX:+UseShenandoahGC -XX:SoftMaxHeapSize=2g -Xmx31g StableLiveSet`, gc ran ~2217 times in 10 sec. After adjusting the Xmx down to 3g, gc only ran 4 times in 10 sec just for the initial learning cycles.
Edit: Generational shenandoah should_start_gc() available size calculation is also inaccurate, but it's way less impacted. GenShen uses (soft_available() - used) as the available. Although it should exclude space reserved for evac to avoid over calculating the available, it doesn't suffer busy gc under minimal load.
~~Generational shenandoah doesn't seem to be impacted. Traditional only.~~
---------
Suggested fix: when deciding when to trigger gc[3], use the logic similar to below:
```
mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100;
available = mutator_soft_capacity - used;
if (available < mutator_soft_capacity) // trigger gc
```
In the above logic, the `available < mutator_soft_capacity` will be reduced to `soft_max * (100 - ShenandoahEvacReserve) / 100 - used`, which does not relate to Xmx and is positively correlate to soft_max.
[1]: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp#L2629
[2]: https://github.com/openjdk/jdk/blob/0a3809f0be94c92c2c46f00fe5ff981afdd55cf0/src/hotspot/share/gc/shenandoah/shenandoahGlobalGeneration.cpp#L86
[3]: https://github.com/openjdk/jdk/blob/0a3809f0be94c92c2c46f00fe5ff981afdd55cf0/src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp#L235
- links to
-
Review(master)
openjdk/jdk/28622