-
Enhancement
-
Resolution: Unresolved
-
P4
-
None
-
None
-
generic
-
generic
Arrange for adaptive heuristic to be more conservative in triggering the start of GC:
1. As currently implemented, the heuristic divides the available memory by the average allocation rate and compares this to the average time required to complete GC. An allocation spike factor is accounted for.
2. Adjust the anticipated time required to complete GC to account for trends in live memory. If live memory is increasing from one GC to the next, then time required to complete GC needs to increase proportionally.
3. Subtract from the time available to complete GC the sum of the maximum times required to perform the next poll (e.g. 17 ms) and the time required to reach a safepoint so GC can start (e.g. 211 ms).
4. Rather than using the average allocation rate computed over all time, consider using the maximum allocation rate “recently” observed between the end of one GC pass and the start of the following GC pass.
Motivation is based on the following observations:
1. Even though configured to poll should_start_gc() every 10 ms, the poll may happen less frequently if the system is heavily loaded. We observed, for example, that the first poll following completion of GC occurred 17 ms after GC finished.
2. Once should_start_gc() returns true, the GC does not actually start until the requested safepoint is reached. In one case, this was observed to occur 211 ms after the safepoint was requested.
3. At a high allocation rate of 8 GBytes per second, these delays represent allocation of roughly 2 GBytes. Worse yet, the allocation rate during this brief intermission between GC passes is often much higher than the “average” allocation rate for the service. This is because the many allocation requests that were stalled during the previous GC pass will now run unimpeded.
4. Whenever the heuristics fail to arrange that GC completes its efforts before the allocation pool has exhausted available memory, we experience long stop-the-world pauses associated with degenerated or full GC.
1. As currently implemented, the heuristic divides the available memory by the average allocation rate and compares this to the average time required to complete GC. An allocation spike factor is accounted for.
2. Adjust the anticipated time required to complete GC to account for trends in live memory. If live memory is increasing from one GC to the next, then time required to complete GC needs to increase proportionally.
3. Subtract from the time available to complete GC the sum of the maximum times required to perform the next poll (e.g. 17 ms) and the time required to reach a safepoint so GC can start (e.g. 211 ms).
4. Rather than using the average allocation rate computed over all time, consider using the maximum allocation rate “recently” observed between the end of one GC pass and the start of the following GC pass.
Motivation is based on the following observations:
1. Even though configured to poll should_start_gc() every 10 ms, the poll may happen less frequently if the system is heavily loaded. We observed, for example, that the first poll following completion of GC occurred 17 ms after GC finished.
2. Once should_start_gc() returns true, the GC does not actually start until the requested safepoint is reached. In one case, this was observed to occur 211 ms after the safepoint was requested.
3. At a high allocation rate of 8 GBytes per second, these delays represent allocation of roughly 2 GBytes. Worse yet, the allocation rate during this brief intermission between GC passes is often much higher than the “average” allocation rate for the service. This is because the many allocation requests that were stalled during the previous GC pass will now run unimpeded.
4. Whenever the heuristics fail to arrange that GC completes its efforts before the allocation pool has exhausted available memory, we experience long stop-the-world pauses associated with degenerated or full GC.