Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Won't Fix
Priority: P4
Fix Version/s: tbd
Affects Version/s: 9, 10, 11, 12, 13
Component/s: hotspot
Labels:

Subcomponent:
gc

While migrating our production services from CMS to G1, we found that G1’s complicated write post-barrier incurs considerable CPU cost. Currently the post-barrier is the following for a write “p.f = q”:

if ((p xor q) >> LOG_REGION_BITS != 0) { // if the write crosses region boundary
  if (q != null) {
    card_address = &card_table[addr_to_index(p)]
    if (*card_address != YOUNG) {
      store_load_fence;
      if (*card_address != DIRTY) {
        *card_address = DIRTY;
        T.dirtyCardQueue.enqueue(card_address);
      }
    }
  }
}

And for CMS the write barrier is only:
card_address = &card_table[addr_to_index(p)]
*card_address = DIRTY;

The complexity of G1’s write post-barrier is due to the need to support concurrently refinement threads. However, even if user has set -XX:G1ConcRefinementThreads=0, the write post-barrier remains the same. Ideally the write post-barrier could be much simpler if there is no concurrent refinement.

This RFE proposes to add a mode to G1 that uses a simplified write post-barrier:
if ((p xor q) >> LOG_REGION_BITS != 0) {
  if (q != null) {
    card_address = &card_table[addr_to_index(p)]
    *card_address = DIRTY;
  }
}

In this mode, G1 would disable concurrent refinement and per-Java-thread dirty card queue. G1 would need to process all dirty cards during a collection pause. Thus pause time could become longer, but as long as MaxGCPauseMillis is reasonably large with regard to the heap size, G1’s adaptive heuristics should still be able to adjust the young-gen size to meet the pause time goal.

This new mode would reduce G1’s CPU usage considerably. It will be particularly helpful for certain types of workloads, e.g.:
- Workloads heavily tuned for CMS to minimize old-gen collections, and sensitive to CPU usage;
- Workloads that mainly care about throughput and CPU usage;

I have implemented a prototype for this mode, and attached some preliminary results.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

20190612-jdkHeadG1FastWB-bigramtester-pause200-nonorm.html
17 kB
2019-06-14 18:08
20190612-jdkHeadG1FastWB-bigramtester-pause3000-nonorm.html
17 kB
2019-06-14 18:08
20190612-jdkHeadG1FastWB-dacapoLarge-stress.html
27 kB
2019-06-14 18:08
20190612-jdkHeadG1FastWB-dacapoLarge4G.html
25 kB
2019-06-14 18:08
single-check-no-young-cards.diff
11 kB
2019-06-26 14:41

relates to

JDK-8253230 G1 20% slower than Parallel in JRuby rubykon benchmark

Open

JDK-8229049 JEP 363: Remove the Concurrent Mark Sweep (CMS) Garbage Collector

Closed

JDK-8226731 Remove StoreLoad in G1 post barrier

Closed

JDK-8230187 Throughput post-write barrier for G1

Closed

JDK-8340827 G1: Improve Application Throughput with a More Efficient Write-Barrier

Submitted

Assignee:: Man Cao

Reporter:: Man Cao

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2019-06-14 18:05

Updated:: 2025-04-23 15:52

Resolved:: 2025-04-23 15:52

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates