-
Enhancement
-
Resolution: Unresolved
-
P4
-
14
This made the G1 post write barrier pretty big due to mitigations of the performance impact, i.e.
given
x.a = q
and
p = @x.a
the barrier currently looks as follows:
if (p and q in same regions or q NULL) -> exit
if (card(p) == Young) -> exit (*)
StoreLoad; (*)
if (card(p) == Old) -> exit
card(p) = Dirty;
enqueue(card(p))
(The lines with the (*) have been added in JDK-8014555)
However the StoreLoad in the write barrier (and the mitigation) is not necessary: in May 2015 [~eosterlund] proposed a technique [1] to use regular global synchronization in a discussion about another memory barrier missing in the CMS write barrier.
The same technique could be applied to G1, and there is even a prototype for this [2]. This code uses a system wide memory synchronization call (e.g. sys_membarrier() equivalent in Linux in this case).
A less invasive way to implement this is to use an epoch counter added to the refinement buffers; if that buffer's epoch counter is less than a minimum of epoch counters attached to threads indicating the "time" at which they last performed memory synchronization, that buffer can be safely refined.
These updates to the epoch counters can be piggy-backed on existing VM transitions/handshakes which already perform a memory barrier (earlier there has been a global UseMemBarrier flag but that controlled this, but it has been removed and defaulted to true).
If a thread does not seem to progress, force such an epoch update by a handshake.
However, according to [~eosterlund] using existing thread transitions already gets you very far with not needing to force any threads. (Also because of existing forced background activity like biased locking revocation or other global handshakes).
Quiescent threads (blocked, in native) can be ignored for this calculation - they are not going to refine anything anyway, and in the next VM transition change they will automatically update to the latest epoch.
(sys_membarrier may still be used in "urgent" cases)
With this technique you can reduce the barrier to
if (p and q in same regions or q NULL) -> exit
if (card(p) == Old) -> exit
card(p) = Dirty;
enqueue(card(p))
Which is known to improve performance quite a bit.
[1] http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2015-May/013264.html
[2] http://cr.openjdk.java.net/~eosterlund/g1_experiments/lazy_sync/webrev.06/ (Note that this webrev also contains changes for
- blocks
-
JDK-8233438 Use zero_filled optimization for young gen regions when committing young regions
- Open
- relates to
-
JDK-8236485 Epoch synchronization protocol for G1 concurrent refinement
- Open
-
JDK-8226197 Reduce G1’s CPU cost with simplified write post-barrier and disabling concurrent refinement
- Open
-
JDK-8220049 Obsolete ThreadLocalHandshakes
- Resolved
-
JDK-8226738 Fold out-of-region and NULL checks in G1 post barrier
- Closed
-
JDK-8253230 G1 20% slower than Parallel in JRuby rubykon benchmark
- Open
-
JDK-8227174 Lazily set card table of allocated regions to correct values in G1
- Open
-
JDK-8087198 G1 card refinement: batching, sorting
- Resolved
-
JDK-8229049 JEP 363: Remove the Concurrent Mark Sweep (CMS) Garbage Collector
- Closed
-
JDK-8340827 G1: Improve Application Throughput with a More Efficient Write-Barrier
- Submitted