Develop additional white-box tests for the Humongous Objects feature of the G1 Garbage Collector.
We will not develop tests for G1 Eager Reclamation.
Garbage First (G1) is a generational garbage collector which divides the heap into equal-sized regions. It has a concurrent collection phase, which can work in parallel with the application, and is multi-threaded.
G1 treats objects bigger than one-half of a memory region, called humongous objects, differently than other objects:
Humongous objects always take up a number of regions. If a humongous object is smaller than one region then it takes up the whole region. If a humongous object is larger than N regions and smaller than (N+1) regions then it takes up (N+1) regions. No allocations are allowed in the free space, if any, of the last region.
They can only be collected at the end of concurrent marking cycle, during a full GC or in young GC in case of G1 Eager Reclaim
They can never be moved from one region to another.
Since G1 is a concurrent and multi-threaded GC, it makes black-box testing very difficult. Several ways to collect dead objects, a few concurrent threads, the ability to work in parallel with the running application, and generally complex algorithms make it nearly impossible to figure out G1's internal state. To address these issues we will extend the WhiteBox API and implement Java tests that use this API to check G1's internal state. We will also be able to reuse these newly developed WhiteBox API methods in stress tests.
To test that the code which handles humongous objects works as expected we need G1 to provide more details about the internal representation of humongous objects on the heap. We will add additional debug methods to G1 which will allow us to get information from its internal data structures and provide control over the initiation of garbage collection. The latter is important because there are three code paths which can collect unreachable humongous objects: A full garbage collection, concurrent marking and young GC in case of G1 Eager Reclamation. To test each path we need to avoid the other.
To help with this we will extend the WhiteBox API with:
Methods to block and initiate concurrent marking and full GCs.
Methods to enumerate G1's regions and access region attributes (e.g., free/occupied/humongous).
Methods to access internal G1 variables such as free memory, region size, and the number of free regions.
Methods to locate regions in the heap, to check that no allocations happen in regions that belong to humongous objects. (This could potentially be a first step to a "heap walker" API that allows us to fully iterate over the Java heap).
Possible alternatives are:
Native built-in JVM tests. Such tests could be started with a JVM flag. They are not suitable since test failures would likely lead to crashes. The JVM should be able to continue to work after a failing test, which is not guaranteed with this approach.
Native tests. This would require adding debug methods to G1 code, and in fact developing a native WhiteBox API. There are certain drawbacks with this approach: We would not be able to use these debug methods for stress tests. More importantly, however, there is still no native testing framework.
Risks and Assumptions
New tests may require changes in G1. This could impact the performance and stability of G1, though we think that is unlikely. If G1 is negatively affected then we could build product binaries without debug methods.