-
Enhancement
-
Resolution: Fixed
-
P2
-
hs24, hs25
-
b03
Over time many problems with performance and in particular memory usage have been observed:
* adding elements to the lowest tier data structure takes a per-remembered set global lock. Measurements have shown that the applications can wait thousands of seconds acquiring these locks. While the affected threads are in most cases refinement threads so does not directly affect the application, it can still affect the ability of G1 to meet some goals needed for keeping pause times (i.e. amount of cards from the refinement buffers to be merged into the card table and then scanned during gc).
* there is a substantial memory overhead for managing the data structures: examples are
* using separate (hash) tables for the three different types of card containers
* there is significant unnecessary preallocation of memory for some of the card set containers
* Containers store redundant information
* inflexibility when reusing memory: in the current implementation the different containers use different approaches to manage memory. Most use the C heap directly, some the C heap with some internal global memory pool. This in practice makes it very difficult to implement anything other than giving back memory in the collection pause. The corresponding "Free Collection Set" pause can take a significant amount of time because of that.
Also memory reuse is limited and preallocating arenas is limited (or would have to be reimplemented multiple times), stressing the C heap allocator.
* inability to support additional use cases: over time interesting ideas (e.g. JDK-8058803) came up for improving performance of remembered set management. Mostly due to redundant information everywhere and completely different handling of various aspects in the containers it is in practice impossible to implement these.
* (partial) inability to give back memory to the OS. While some of the containers use the C heap allocator, and so in some way give back memory, these implementations and handling is different for every container.
* the existing granularity of containers are unbalanced: currently there exist three tiers: "sparse", "fine" and "full". Sparse is an array of cards ranging in the hundreds maybe, "fine" is a bitmap covering a whole region and full is a bit indicating that that region should be scanned completely during GC.
The problems are that there is nothing between "no card at all" and "sparse" and in particular the difference between the capability to hold entries of "sparse" and "fine". I.e. memory usage difference when exceeding a "sparse" array (holding 128 entries at 32M regions, taking ~256 bytes) to fine that is able to hold 65k entries using 8kB is significant.
For these reason there is even a dedicated option to stop allocating more "fine" containers and just give up and use "full" instead to avoid excessive memory usage. With extremely bad consequences in pause times.
Over time some of these issues have been fixed or in many cases band-aided, and some of these fixes and ideas were the result of working on this change (e.g.
This change is effectively a rewrite of the Java heap card based part of a region's remembered set.
This initial fully working change can be roughly described with the following properties:
* use a single ConcurrentHashTable for the card containers of a given region. The container in use replaced (coarsened) on the fly within the CHT node, completely lock-free. This implements
* memory for a given region's remembered set for all containers (and the CHT nodes) is backed by per container type and per remembered set arena style bump-pointer allocation buffers. In this change, in the pause, memory is given back to free lists only. The implementation gives back memory to the OS concurrently to the application. Memory is still managed using the C heap memory manager though, but abstracted away and could be replaced by manual page memory management.
* there are now four different container types and one meta-container type. These four actual containers are:
* inline pointer: the change store a few (3-5) cards in the CHT node directly and uses no extra memory.
* array of cards: similar to the "sparse" container, an array of cards with a configurable amount of entries. However bulk allocation of memory is now managed at a lower level so there is much less waste.
* bitmap: similar to "fine", a bitmap spanning a (sub-)range of memory
* full: same as full, indicating for a (sub-)range of memory that all cards are to be looked at during scan. Similar to inline pointers, this uses no extra memory.
* howl: the Howl container subdivides a given memory range into subranges where any of the other containers describing that sub-range of the heap may be stored in. This is somewhat similar to the idea suggested in
* care has been taken to minimize container memory usage, e.g. by not adding redundant information there and in general carefully specify them. They have been designed with future enhancements in mind.
In some benchmarks (where there is significant remembered set memory usage) we are seeing memory reduction to 25% of JDK 16 levels with this change. Garbage collection times are at most as long or shorter than before; most changes affecting that have been extracted earlier. Individiual affected phases are generally shorter.
- blocks
-
JDK-8267830 Investigate G1CardSet ConcurrentHashTable sizing heuristics
- Open
-
JDK-8267831 Improve G1CardSetAllocator sizing heuristics
- Open
-
JDK-8267833 Improve G1CardSetInlinePtr::add()
- Resolved
-
JDK-8267834 Refactor G1CardSetAllocator and BufferNode::Allocator to use a common base class
- Resolved
- csr for
-
JDK-8266721 G1: Refactor remembered sets
- Closed
- duplicates
-
JDK-8034873 Concurrent collection set freeing
- Closed
-
JDK-8224840 Optimize G1CardTable::mark_region_table()
- Closed
-
JDK-8227665 Clearing collection set candidates takes a significant amount of time
- Closed
-
JDK-8233012 Improve G1 ergonomics for G1RSetRegionEntries(Base)
- Closed
- relates to
-
JDK-8266637 CHT: Add insert_and_get method
- Resolved
-
JDK-8077144 Concurrent mark initialization takes too long
- Closed
-
JDK-8269120 Build failure with GCC 6.3.0 after JDK-8017163
- Closed
-
JDK-8280088 NMT: Make mtGCCardSet the subcategory of mtGC
- Open
-
JDK-8048504 G1: Investigate replacing the coarse and fine grained data structures in the remembered sets
- Resolved
-
JDK-8151386 Extract card live data out of G1ConcurrentMark
- Resolved
-
JDK-8145672 Remove dependency of G1FromCardCache to HeapRegionRemSet
- Resolved
-
JDK-8145673 G1RemSetSummary.hpp uses FREE_C_HEAP_ARRAY
- Resolved
-
JDK-8145674 Fix includes and forward declarations in g1Remset files
- Resolved
-
JDK-8145774 Move scrubbing setup code away out of ConcurrentMark
- Resolved
-
JDK-8213108 Improve work distribution during remembered set scan
- Resolved
-
JDK-8213996 Remove one of the SparsePRT entry tables
- Resolved
-
JDK-8213997 Remove G1HRRSUseSparseTable flag
- Resolved
-
JDK-8233919 Incrementally calculate the occupied cards in a heap region remembered set
- Resolved
-
JDK-8266821 G1: Prefetch cards during merge heap roots phase
- Resolved
-
JDK-8273144 Remove unused top level "Sample Collection Set Candidates" logging
- Resolved
-
JDK-8274430 Remove some debug error printing code added in JDK-8017163
- Resolved
-
JDK-8287024 G1: Improve the API boundary between HeapRegionRemSet and G1CardSet
- Resolved
-
JDK-8345397 Remove <cstdio> from g1HeapRegionRemSet.cpp
- Resolved
-
JDK-8151846 Record the number of live cards per region while creating live data
- Closed
-
JDK-8134048 Clear remembered set while shrinking the heap
- Closed
-
JDK-8273941 G1 GC tuning guide updates for JDK18
- Resolved
-
JDK-8242032 G1 region remembered sets may contain non-coarse level PRTs for already coarsened regions
- Resolved
-
JDK-8276540 Howl Full CardSet container iteration marks too many cards
- Closed
-
JDK-8048075 Adding JFR events to track G1 Remembered set size
- Open
-
JDK-8058803 Allow one remembered set to be used for multiple regions
- Open
-
JDK-6949259 G1: Merge sparse and fine remembered set hash tables
- Resolved
-
JDK-8145667 Move FromCardCache into separate files
- Resolved
-
JDK-8145671 Rename FromCardCache to G1FromCardCache
- Resolved
-
JDK-8153503 Move remset scan iteration claim to remset local data structure
- Resolved
-
JDK-8153507 Improve Card Table Clear Task
- Resolved
-
JDK-8162928 Micro-optimizations in scanning the remembered sets
- Resolved
-
JDK-8180415 Rebuild remembered sets during the concurrent cycle
- Resolved
-
JDK-8262185 G1: Prune collection set candidates early
- Resolved
-
JDK-8269134 Remove sparsePRT.inline.hpp after JDK-8017163
- Resolved
-
JDK-8275056 Virtualize G1CardSet containers over heap region
- Resolved
-
JDK-8273186 Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet
- Resolved
-
JDK-8016505 G1: Revert back to use HeapBaseMinAddress=256m on Solaris x86
- Closed
-
JDK-8229049 JEP 363: Remove the Concurrent Mark Sweep (CMS) Garbage Collector
- Closed
-
JDK-7187490 G1: Limit the amount of remembered set scrubbing
- Closed
-
JDK-8043574 Investigate decreasing the RS scrubbing work in the GC cleanup pause
- Closed
-
JDK-8134048 Clear remembered set while shrinking the heap
- Closed
-
JDK-8153505 Split up G1RemSet::oops_into_collection_set_do into parts
- Closed