-
CSR
-
Resolution: Approved
-
P4
-
None
-
behavioral
-
minimal
-
-
Other
Summary
The "G1 Old Gen" MemoryPoolMXBean is added to the "G1 Young Collection" GarbageCollectorMXBean memory pool list. Code that iterates over the list will see the addition. "G1 Old Gen" CollectionUsage.used is updated after a mixed collection, where now it's updated only after a full-heap stop-the-world GC. The "G1 Old Gen" CollectionUsage.used value series is changed accordingly.
Problem
At Amazon, we want to measure long term heap occupancy with a view to detecting long term memory leaks and unexpected steady load increases. For this, we find CollectionUsage to be a more useful metric than Usage. The latter measures instantaneous heap occupancy, which contains an unknown amount of yet-to-be-collected garbage, so we can't determine where to set an alarm on it. Too low and we get false positives due to garbage that will shortly be collected. Too high and we don't detect a failing JVM in time to do anything about it. CollectionUsage, on the other hand, is a good proxy for the amount of long term live data and can be usefully alarmed on. At present, however, it's useless for G1 because it doesn't reflect the result of mixed collections on the old gen.
Solution
The root cause is that the "G1 Young Collection" GarbageCollectorMXBean records the result of a collection only on the "G1 Eden Space" and "G1 Survivor Space" memory pools, even in the case of a mixed collection. It is in fact misnamed: "G1 Incremental Collection" would be a better description of what it actually does.
Adding the "G1 Old Gen" memory pool to the "G1 Young Collection" memory pool list doesn't violate the existing spec, because the spec doesn't define the pools for a collector. This CSR argues that recording the effect of mixed collections on the "G1 Old Gen" memory pool is correct behavior, that correct behavior is more important than legacy behavior, and that the current "G1 Old Gen" CollectionUsage value is of such limited use that changing it to something usable is low risk.
G1 full GCs happen rarely and only under severe pressure, so when they do, external reaction is pretty much limited to reducing load so the JVM can get back to a usable steady state, or just restarting the JVM. Neither action cares about the value of CollectionUsage. G1 old gen CollectionUsage is zero until a full GC occurs, after which its value includes both long-lived objects and any transient data that was in eden and the survivor space. That value doesn't tell you anything about long term old gen or survivor size because it lumps them them together. So, it isn't a useful metric, nor will it be after any subsequent full GCs. The only information it provides is on the first zero to non-zero transition, which just tells you that the JVM is, or was, in trouble. Further, the effect of the runup to a full GC is SLA violations, which are noticed before the full GC happens, so detecting the first full GC is unneeded confirmation, not prediction.
Specification
There are no specification changes. Behavioral changes are discussed above.
- csr of
-
JDK-8215934 G1 Old Gen MemoryPool CollectionUsage.used values don't reflect mixed GC results
- Resolved
- relates to
-
JDK-8196719 G1 Old Gen MemoryPool CollectionUsage.used values don't reflect mixed GC results
- Closed