Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8226731

Remove StoreLoad in G1 post barrier

    XMLWordPrintable

Details

    • gc

    Description

      In 8u20(?) we had to introduce a StoreLoad memory barrier instruction to the G1 post write barrier to make sure that the refinement thread ses the reference store from the mutator (JDK-8014555).

      This made the G1 post write barrier pretty big due to mitigations of the performance impact, i.e.

      given

      x.a = q

      and

      p = @x.a

      the barrier currently looks as follows:

      if (p and q in same regions or q NULL) -> exit
      if (card(p) == Young) -> exit (*)
      StoreLoad; (*)
      if (card(p) == Old) -> exit
      card(p) = Dirty;
      enqueue(card(p))

      (The lines with the (*) have been added in JDK-8014555)

      However the StoreLoad in the write barrier (and the mitigation) is not necessary: in May 2015 [~eosterlund] proposed a technique [1] to use regular global synchronization in a discussion about another memory barrier missing in the CMS write barrier.

      The same technique could be applied to G1, and there is even a prototype for this [2]. This code uses a system wide memory synchronization call (e.g. sys_membarrier() equivalent in Linux in this case).

      A less invasive way to implement this is to use an epoch counter added to the refinement buffers; if that buffer's epoch counter is less than a minimum of epoch counters attached to threads indicating the "time" at which they last performed memory synchronization, that buffer can be safely refined.

      These updates to the epoch counters can be piggy-backed on existing VM transitions/handshakes which already perform a memory barrier (earlier there has been a global UseMemBarrier flag but that controlled this, but it has been removed and defaulted to true).

      If a thread does not seem to progress, force such an epoch update by a handshake.

      However, according to [~eosterlund] using existing thread transitions already gets you very far with not needing to force any threads. (Also because of existing forced background activity like biased locking revocation or other global handshakes).

      Quiescent threads (blocked, in native) can be ignored for this calculation - they are not going to refine anything anyway, and in the next VM transition change they will automatically update to the latest epoch.

      (sys_membarrier may still be used in "urgent" cases)

      With this technique you can reduce the barrier to

      if (p and q in same regions or q NULL) -> exit
      if (card(p) == Old) -> exit
      card(p) = Dirty;
      enqueue(card(p))

      Which is known to improve performance quite a bit.

      [1] http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2015-May/013264.html
      [2] http://cr.openjdk.java.net/~eosterlund/g1_experiments/lazy_sync/webrev.06/ (Note that this webrev also contains changes for JDK-8087198)

      Attachments

        Issue Links

          Activity

            People

              manc Man Cao
              tschatzl Thomas Schatzl
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: