In JDK-8351167, ZGC switched from allocating memory for the livemap when a page is allocated to when the livemap is about to be marked for the first time. This was done to make page allocation latencies for mutators lower in preparation for JDK-8350441, and instead move the latency hit of (m)allocating the livemap to GC threads exclusively.
The memory for the livemap is (m)allocated inside an existing critical section, where only one thread resets the livemap, which includes (m)allocating memory for it if not already allocated. The critical section (and progress) is guarded by two mechanisms:
1. A spin-loop which checks if the sequence number for the livemap is correct
2. A compare-exchange so that only one thread enters the critical section
When the thread that manages to successfuly do the compare-exchange and then reset the livemap is done, it updates the sequence number to notify other threads that the livemap has been reset and they can stop spinning. BeforeJDK-8351167, it was unlikely that multiple threads would try to reset the livemap simultaneously, since the critical section was quite short. Now however, the critical section is considerably longer (in duration) when (m)allocating inside it, and there is a greater chance that other threads will also try to reset the livemap, causing contention. When a thread is waiting on another thread to reset the livemap, it start spinning (busy-waiting) on an Atomic::load_acquire, until the thread resetting is done. On most platforms this seems to not affect performance negatively, but on (Linux) aarch64, the performance is noticeably worse due to load_acquire being slow.
A simple fix, which doesn't add significant complexity, is adding a wait/yield. The yield allows some time for the thread resetting to potentially finish before it checks if it can continue, which means the thread does not spin excessively on Atomic::load_acquire.
A more complex solution is to (m)allocate memory for the livemap outside the critical section and only install it when entering it. This requires more structure and utility to make this an efficient solution, which doesn't give much benefit, since low latency is not a strict requirement in the marking phase (which resets the livemap).
Another solution is to replace the spin-loop with a wait/notify mechanism. When looking at the performance results, it doesn't compare that much differently from adding the yield, which is a much simpler fix for this problem.
The performance regression is shown in SPECjvm2008-LU.small-ZGC afterJDK-8351167:
-2.6% SPECjvm2008-LU.small-ZGC on Linux aarch64
The regression was isolated to jdk-25+15-1606 (which contains theJDK-8351167 changes)
The memory for the livemap is (m)allocated inside an existing critical section, where only one thread resets the livemap, which includes (m)allocating memory for it if not already allocated. The critical section (and progress) is guarded by two mechanisms:
1. A spin-loop which checks if the sequence number for the livemap is correct
2. A compare-exchange so that only one thread enters the critical section
When the thread that manages to successfuly do the compare-exchange and then reset the livemap is done, it updates the sequence number to notify other threads that the livemap has been reset and they can stop spinning. Before
A simple fix, which doesn't add significant complexity, is adding a wait/yield. The yield allows some time for the thread resetting to potentially finish before it checks if it can continue, which means the thread does not spin excessively on Atomic::load_acquire.
A more complex solution is to (m)allocate memory for the livemap outside the critical section and only install it when entering it. This requires more structure and utility to make this an efficient solution, which doesn't give much benefit, since low latency is not a strict requirement in the marking phase (which resets the livemap).
Another solution is to replace the spin-loop with a wait/notify mechanism. When looking at the performance results, it doesn't compare that much differently from adding the yield, which is a much simpler fix for this problem.
The performance regression is shown in SPECjvm2008-LU.small-ZGC after
-2.6% SPECjvm2008-LU.small-ZGC on Linux aarch64
The regression was isolated to jdk-25+15-1606 (which contains the
- caused by
-
JDK-8351167 ZGC: Lazily initialize livemap
-
- Resolved
-
- links to
-
Commit(master) openjdk/jdk/78a392aa
-
Review(master) openjdk/jdk/25580