A DESCRIPTION OF THE PROBLEM :
Context and Motivation
In multi-tenant environments e.g. Kubernetes clusters in cloud environments there is a strong incentive to use as little memory as possible. Lower memory usage means more processes can be packed on a single VM which directly translates to lower cloud cost.
Configuring G1 heap size in this setup is currently challenging. On the one hand we would like to set the max heap size to a high value so that application doesn’t fail with heap OOME when faced with unexpectedly high load or organic growth. On the other hand we need to set max heap size to as small a value as possible because G1 is very eager to expand heap even when tuned to collect garbage aggressively.
Ideally, we would like to:
Set the initial heap size to a small value.
Set the max heap size to a value larger than expected usage so that application can handle unexpected load and organic growth.
Configure G1 GC to not expand heap aggressively. This is currently not possible.
We propose two new JVM G1 flags that would give us more control over G1 heap expansion aggressiveness and realize significant cost savings in multi-tenant environments.
At the same time we don’t want to change existing G1 behavior - with default values of the new flags current G1 behavior would be maintained.
Analysis
Currently even with very aggressive G1 configuration such as:
-XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60
the heap is fairly eagerly expanded.
We found two culprits responsible for this in G1HeapSizingPolicy::young_collection_expansion_amount() function.
First, the scale_with_heap() function makes pause_time_threshold small in cases where current heap size is smaller than 1/2 of max heap size. While it is likely a desired behavior in many situations, it also causes memory usage spikes in situations where max heap size is much larger than current heap size.
Second, the MinOverThresholdForGrowth constant equal to 4 is an arbitrary value which hardcodes the heap expansion aggressiveness. We observed that short_term_pause_time_ratio can exceed pause_time_threshold and trigger heap expansion too eagerly in many situations, especially when allocation rate is spiky.
Proposal
We would like to introduce two new experimental flags:
G1ScaleWithHeapPauseTimeThreshold: a binary flag that would allow disabling scale_with_heap()
G1MinPausesOverThresholdForGrowth: a value between 1 and 10, a configurable replacement for the MinOverThresholdForGrowth constant.
We don’t want to change the default behavior of G1. Default values for these flags (G1ScaleWithHeapPauseTimeThreshold=true, G1MinPausesOverThresholdForGrowth=4) would maintain the existing behavior.
Alternatives
There is currently no good alternative. Potentially we could configure G1 aggressively to trigger GC very frequently e.g.: -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60
Even with this configuration we see occasional large memory spikes where heap is quickly expanded. Even though the expanded heap contracts eventually, this poses a significant problem because in practice we don’t know if such a spike could have been avoided so it is not obvious how much memory the application really needs. Of course such configuration would also consume more CPU.
Experimental results
With new flags we can set use far less aggressive -XX:GCTimeRatio=9 together with -XX:-G1ScaleWithHeapPauseTimeThreshold and -XX:G1MinPausesOverThresholdForGrowth=10 (this effectively disables heap expansion based on short time pause ratio and only depends on long time pause ratio).
Compared to more aggressive G1 configuration mentioned above we see lower CPU usage, and 30%-60% lower max memory usage.
Implementation
https://github.com/openjdk/jdk/pull/23534
Context and Motivation
In multi-tenant environments e.g. Kubernetes clusters in cloud environments there is a strong incentive to use as little memory as possible. Lower memory usage means more processes can be packed on a single VM which directly translates to lower cloud cost.
Configuring G1 heap size in this setup is currently challenging. On the one hand we would like to set the max heap size to a high value so that application doesn’t fail with heap OOME when faced with unexpectedly high load or organic growth. On the other hand we need to set max heap size to as small a value as possible because G1 is very eager to expand heap even when tuned to collect garbage aggressively.
Ideally, we would like to:
Set the initial heap size to a small value.
Set the max heap size to a value larger than expected usage so that application can handle unexpected load and organic growth.
Configure G1 GC to not expand heap aggressively. This is currently not possible.
We propose two new JVM G1 flags that would give us more control over G1 heap expansion aggressiveness and realize significant cost savings in multi-tenant environments.
At the same time we don’t want to change existing G1 behavior - with default values of the new flags current G1 behavior would be maintained.
Analysis
Currently even with very aggressive G1 configuration such as:
-XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60
the heap is fairly eagerly expanded.
We found two culprits responsible for this in G1HeapSizingPolicy::young_collection_expansion_amount() function.
First, the scale_with_heap() function makes pause_time_threshold small in cases where current heap size is smaller than 1/2 of max heap size. While it is likely a desired behavior in many situations, it also causes memory usage spikes in situations where max heap size is much larger than current heap size.
Second, the MinOverThresholdForGrowth constant equal to 4 is an arbitrary value which hardcodes the heap expansion aggressiveness. We observed that short_term_pause_time_ratio can exceed pause_time_threshold and trigger heap expansion too eagerly in many situations, especially when allocation rate is spiky.
Proposal
We would like to introduce two new experimental flags:
G1ScaleWithHeapPauseTimeThreshold: a binary flag that would allow disabling scale_with_heap()
G1MinPausesOverThresholdForGrowth: a value between 1 and 10, a configurable replacement for the MinOverThresholdForGrowth constant.
We don’t want to change the default behavior of G1. Default values for these flags (G1ScaleWithHeapPauseTimeThreshold=true, G1MinPausesOverThresholdForGrowth=4) would maintain the existing behavior.
Alternatives
There is currently no good alternative. Potentially we could configure G1 aggressively to trigger GC very frequently e.g.: -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60
Even with this configuration we see occasional large memory spikes where heap is quickly expanded. Even though the expanded heap contracts eventually, this poses a significant problem because in practice we don’t know if such a spike could have been avoided so it is not obvious how much memory the application really needs. Of course such configuration would also consume more CPU.
Experimental results
With new flags we can set use far less aggressive -XX:GCTimeRatio=9 together with -XX:-G1ScaleWithHeapPauseTimeThreshold and -XX:G1MinPausesOverThresholdForGrowth=10 (this effectively disables heap expansion based on short time pause ratio and only depends on long time pause ratio).
Compared to more aggressive G1 configuration mentioned above we see lower CPU usage, and 30%-60% lower max memory usage.
Implementation
https://github.com/openjdk/jdk/pull/23534
- relates to
-
JDK-8238687 Investigate memory uncommit during young collections in G1
-
- Open
-
-
JDK-8349978 G1: Reconsider G1 GCTimeRatio boosting during heap expansion for small heaps
-
- Open
-