Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8262877

PPC sequential consistency problem: volatile stores are too weak

XMLWordPrintable

      (Credit: this inconsistency was originally found by Shuyang Liu, UCLA)

      Consider this test:

      volatile int x, y;

      Thread 1:
       x = 2
       r1 = y

      Thread 2:
        y = 1

      Thread 3:
        r2 = y
        x = 1

      Thread 4:
        r3 = x
        r4 = x

      The state (r1,r2,r3,r4) = (0,1,1,2) is forbidden, as it violates sequential consistency. (You can show it by constructing the syncronization order that leads to this result and observing it is cyclic).

      Current PPC code is one (the only?) platform that runs into the SC violation with current barrier placement. Current placement seems to be:

      C2:

      volatile load:
        MemBarVolatile // lowered to "sync"
        load
        MemBarAcquire // lowered to "lwsync"

      volatile store:
        MemBarRelease // lowered to "lwsync"
        store

      C1:

      volatile load:
        MacroAssembler::membar() // lowers to "sync"
        load
        MacroAssembler::membar_acquire() // lowers to "lwsync"

      volatile store:
        MacroAssembler::membar_release() // lowers to "lwsync"
        store

      Note that this placement means that back-to-back volatile load and volatile store only have lwsync between them. This kind of back-to-back volatile load->store is what is happening in Thread 3 in the example above. But, lwsync is insufficient for enforcing sequential consistency (it only orders in message-passing-like acq/rel scenarios, AFAIU).

      herd7 test cases show that lwsync or isync+lwsync are indeed insufficient:
        https://cr.openjdk.java.net/~shade/8262877/full-barriers.litmus
        https://cr.openjdk.java.net/~shade/8262877/relaxed-lwsync.litmus
        https://cr.openjdk.java.net/~shade/8262877/relaxed-lwsync-isync.litmus

      The test with full barriers everywhere has the outcome forbidden:

      $ herd7 -model ppc.cat full-barriers.litmus
      Positive: 0 Negative: 42
      Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
      Observation C1 Never 0 42

      ...and a single relaxation to lwsync between load and store in P2 breaks it:

      $ herd7 -model ppc.cat relaxed-lwsync.litmus
      Positive: 1 Negative: 46
      Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
      Observation C1 Sometimes 1 46

      ...adding isync does not help either:

      $ herd7 -model ppc.cat relaxed-lwsync-isync.litmus
      Positive: 1 Negative: 46
      Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
      Observation C1 Sometimes 1 46

      Note that C++ Atomic mappings (https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html) say that:
       Load SeqCst: hwsync; ld; cmp; bc; isync
       Store SeqCst: hwsync; st

      Comparing it with current Hotspot's placement, the barrier leading to a store needs to be "sync", not "lwsync".

      This is the C1/C2-like translation of the example to PPC in its current form, and with stronger sync stores:
       https://cr.openjdk.java.net/~shade/8262877/stores-lwsync.litmus
       https://cr.openjdk.java.net/~shade/8262877/stores-sync.litmus

      $ herd7 -model ppc.cat stores-lwsync.litmus
      Positive: 1 Negative: 46
      Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
      Observation StoreLwsync Sometimes 1 46

      $ herd7 -model ppc.cat stores-sync.litmus
      Positive: 0 Negative: 42
      Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
      Observation StoresSync Never 0 42

      Despite my best attempts, I still fail to reproduce this in jcstress:
       https://github.com/openjdk/jcstress/blob/master/tests-custom/src/main/java/org/openjdk/jcstress/tests/volatiles/PowerSCViolation.java

      C2 dump for that actor3 in that test on ppc64le is consistent with the barrier placement discussed above:

        0x00003fff89045e38: sync // leading load barrier
        0x00003fff89045e3c: lwz r15,16(r3) // volatile load
        0x00003fff89045e40: twi 0,r15,0 // trailing load barrier
        0x00003fff89045e44: isync
        0x00003fff89045e48: stw r15,16(r4) // record result
        0x00003fff89045e4c: lwsync // leading store barrier
        0x00003fff89045e50: li r14,1
        0x00003fff89045e54: stw r14,12(r3) // volatile store

            Unassigned Unassigned
            shade Aleksey Shipilev
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: