Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8242115

C2 SATB barriers are not safepoint-safe



    • b18



        In ZGC we are very sensitive to accesses and their GC barriers being separated by safepoints. We never managed to tame the sea of nodes to ensure this, and hence elected to go with expanding the barriers as late as humanly possible, right before assembly.

        For some reason it has been believed that despite this being a huge problem for ZGC, it is not at all a problem for, e.g., SATB collectors (like G1 and Shenandoah). But they face the same issue. Consider the Reference.get() intrinsic. It loads the referent, and enqueues it into its thread-local SATB buffer. The load of the referent, and the store, that publishes it to the thread-local SATB buffer may *not* be separated by any safepoints.
        At safepoint polls, we can also deoptimize. Most of the time G1 is fine, because of re-marking when concurrent marking terminates in a safepoint. If it finds something not yet marked on the stack, it marks it during remark. That is probably why we do not see any crashes. However, consider a Java call being between the referent load and the SATB-barrier. The nmethod may then get deoptimized when the callee returns. In such a scenario, nobody will have marked the referent, and when we roll into the interpreter, it may store it into some field, and return. Now the object graph has been corrupted with an object that will never get marked.

        There are similar issues involving stack walkers that can catch the local storing the referent, before it has become SATB enqueued, and then the nmethod deoptimizes.

        Not your every day race, but it is possible in theory, and is quite annoying.

        This similarly applies to the "is marking active" load expanded in the pre-write store barriers for G1.

        I have written some verification code for G1 (only works without compressed oops for now):
        This patch tags the referent loads, and the SATB store with the same unique number. Then it checks the mach nodes before generating machine code, matching the load and store, traversing the store and its dominators up to the load, asserting that there are no safepoints in these blocks. With SPECjbb2015 the assertion fails after a few seconds, which gives a hint that this scary stuff is probably happening for real, and has been broken since forever.


          Issue Links



                iveresov Igor Veresov
                eosterlund Erik Ă–sterlund
                0 Vote for this issue
                17 Start watching this issue