Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8356880

ZGC: Backoff in ZLiveMap::reset spin-loop

XMLWordPrintable

    • gc
    • b15
    • 25
    • master
    • aarch64

      In JDK-8351167, ZGC switched from allocating memory for the livemap when a page is allocated to when the livemap is about to be marked for the first time. This was done to make page allocation latencies for mutators lower in preparation for JDK-8350441, and instead move the latency hit of (m)allocating the livemap to GC threads exclusively.

      The memory for the livemap is (m)allocated inside an existing critical section, where only one thread resets the livemap, which includes (m)allocating memory for it if not already allocated. The critical section (and progress) is guarded by two mechanisms:
        1. A spin-loop which checks if the sequence number for the livemap is correct
        2. A compare-exchange so that only one thread enters the critical section

      When the thread that manages to successfuly do the compare-exchange and then reset the livemap is done, it updates the sequence number to notify other threads that the livemap has been reset and they can stop spinning. Before JDK-8351167, it was unlikely that multiple threads would try to reset the livemap simultaneously, since the critical section was quite short. Now however, the critical section is considerably longer (in duration) when (m)allocating inside it, and there is a greater chance that other threads will also try to reset the livemap, causing contention. When a thread is waiting on another thread to reset the livemap, it start spinning (busy-waiting) on an Atomic::load_acquire, until the thread resetting is done. On most platforms this seems to not affect performance negatively, but on (Linux) aarch64, the performance is noticeably worse due to load_acquire being slow.

      A simple fix, which doesn't add significant complexity, is adding a wait/yield. The yield allows some time for the thread resetting to potentially finish before it checks if it can continue, which means the thread does not spin excessively on Atomic::load_acquire.

      A more complex solution is to (m)allocate memory for the livemap outside the critical section and only install it when entering it. This requires more structure and utility to make this an efficient solution, which doesn't give much benefit, since low latency is not a strict requirement in the marking phase (which resets the livemap).
      Another solution is to replace the spin-loop with a wait/notify mechanism. When looking at the performance results, it doesn't compare that much differently from adding the yield, which is a much simpler fix for this problem.

      The performance regression is shown in SPECjvm2008-LU.small-ZGC after JDK-8351167:
      -2.6% SPECjvm2008-LU.small-ZGC on Linux aarch64
      The regression was isolated to jdk-25+15-1606 (which contains the JDK-8351167 changes)

            jsikstro Joel Sikstrom
            resii Robert Strout
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: