Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Won't Fix
Priority: P4
Fix Version/s: tbd
Affects Version/s: 14
Component/s: hotspot
Labels:

Subcomponent:
gc

In 8u20(?) we had to introduce a StoreLoad memory barrier instruction to the G1 post write barrier to make sure that the refinement thread ses the reference store from the mutator (JDK-8014555).

This made the G1 post write barrier pretty big due to mitigations of the performance impact, i.e.

given

x.a = q

and

p = @x.a

the barrier currently looks as follows:

if (p and q in same regions or q NULL) -> exit
if (card(p) == Young) -> exit (*)
StoreLoad; (*)
if (card(p) == Old) -> exit
card(p) = Dirty;
enqueue(card(p))

(The lines with the (*) have been added in JDK-8014555)

However the StoreLoad in the write barrier (and the mitigation) is not necessary: in May 2015 [~eosterlund] proposed a technique [1] to use regular global synchronization in a discussion about another memory barrier missing in the CMS write barrier.

The same technique could be applied to G1, and there is even a prototype for this [2]. This code uses a system wide memory synchronization call (e.g. sys_membarrier() equivalent in Linux in this case).

A less invasive way to implement this is to use an epoch counter added to the refinement buffers; if that buffer's epoch counter is less than a minimum of epoch counters attached to threads indicating the "time" at which they last performed memory synchronization, that buffer can be safely refined.

These updates to the epoch counters can be piggy-backed on existing VM transitions/handshakes which already perform a memory barrier (earlier there has been a global UseMemBarrier flag but that controlled this, but it has been removed and defaulted to true).

If a thread does not seem to progress, force such an epoch update by a handshake.

However, according to [~eosterlund] using existing thread transitions already gets you very far with not needing to force any threads. (Also because of existing forced background activity like biased locking revocation or other global handshakes).

Quiescent threads (blocked, in native) can be ignored for this calculation - they are not going to refine anything anyway, and in the next VM transition change they will automatically update to the latest epoch.

(sys_membarrier may still be used in "urgent" cases)

With this technique you can reduce the barrier to

if (p and q in same regions or q NULL) -> exit
if (card(p) == Old) -> exit
card(p) = Dirty;
enqueue(card(p))

Which is known to improve performance quite a bit.

[1] http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2015-May/013264.html
[2] http://cr.openjdk.java.net/~eosterlund/g1_experiments/lazy_sync/webrev.06/ (Note that this webrev also contains changes for ~~JDK-8087198~~)

blocks

JDK-8233438 Use zero_filled optimization for young gen regions when committing young regions

Open

relates to

JDK-8236485 Epoch synchronization protocol for G1 concurrent refinement

Closed

JDK-8220049 Obsolete ThreadLocalHandshakes

Resolved

JDK-8226197 Reduce G1’s CPU cost with simplified write post-barrier and disabling concurrent refinement

Closed

JDK-8226738 Fold out-of-region and NULL checks in G1 post barrier

Closed

JDK-8253230 G1 20% slower than Parallel in JRuby rubykon benchmark

Open

JDK-8227174 Lazily set card table of allocated regions to correct values in G1

Open

JDK-8087198 G1 card refinement: batching, sorting

Resolved

JDK-8229049 JEP 363: Remove the Concurrent Mark Sweep (CMS) Garbage Collector

Closed

JDK-8340827 G1: Improve Application Throughput with a More Efficient Write-Barrier

Submitted

(5 relates to)

Epoch synchronization protocol for G1 concurrent refinement

Closed

Man Cao

Assignee:: Man Cao

Reporter:: Thomas Schatzl

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2019-06-25 02:30

Updated:: 2025-04-23 15:51

Resolved:: 2025-04-23 15:51

Details

Description

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates