-
Bug
-
Resolution: Unresolved
-
P3
-
11, 16, 17
-
ppc
(Credit: this inconsistency was originally found by Shuyang Liu, UCLA)
Consider this test:
volatile int x, y;
Thread 1:
x = 2
r1 = y
Thread 2:
y = 1
Thread 3:
r2 = y
x = 1
Thread 4:
r3 = x
r4 = x
The state (r1,r2,r3,r4) = (0,1,1,2) is forbidden, as it violates sequential consistency. (You can show it by constructing the syncronization order that leads to this result and observing it is cyclic).
Current PPC code is one (the only?) platform that runs into the SC violation with current barrier placement. Current placement seems to be:
C2:
volatile load:
MemBarVolatile // lowered to "sync"
load
MemBarAcquire // lowered to "lwsync"
volatile store:
MemBarRelease // lowered to "lwsync"
store
C1:
volatile load:
MacroAssembler::membar() // lowers to "sync"
load
MacroAssembler::membar_acquire() // lowers to "lwsync"
volatile store:
MacroAssembler::membar_release() // lowers to "lwsync"
store
Note that this placement means that back-to-back volatile load and volatile store only have lwsync between them. This kind of back-to-back volatile load->store is what is happening in Thread 3 in the example above. But, lwsync is insufficient for enforcing sequential consistency (it only orders in message-passing-like acq/rel scenarios, AFAIU).
herd7 test cases show that lwsync or isync+lwsync are indeed insufficient:
https://cr.openjdk.java.net/~shade/8262877/full-barriers.litmus
https://cr.openjdk.java.net/~shade/8262877/relaxed-lwsync.litmus
https://cr.openjdk.java.net/~shade/8262877/relaxed-lwsync-isync.litmus
The test with full barriers everywhere has the outcome forbidden:
$ herd7 -model ppc.cat full-barriers.litmus
Positive: 0 Negative: 42
Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
Observation C1 Never 0 42
...and a single relaxation to lwsync between load and store in P2 breaks it:
$ herd7 -model ppc.cat relaxed-lwsync.litmus
Positive: 1 Negative: 46
Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
Observation C1 Sometimes 1 46
...adding isync does not help either:
$ herd7 -model ppc.cat relaxed-lwsync-isync.litmus
Positive: 1 Negative: 46
Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
Observation C1 Sometimes 1 46
Note that C++ Atomic mappings (https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html) say that:
Load SeqCst: hwsync; ld; cmp; bc; isync
Store SeqCst: hwsync; st
Comparing it with current Hotspot's placement, the barrier leading to a store needs to be "sync", not "lwsync".
This is the C1/C2-like translation of the example to PPC in its current form, and with stronger sync stores:
https://cr.openjdk.java.net/~shade/8262877/stores-lwsync.litmus
https://cr.openjdk.java.net/~shade/8262877/stores-sync.litmus
$ herd7 -model ppc.cat stores-lwsync.litmus
Positive: 1 Negative: 46
Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
Observation StoreLwsync Sometimes 1 46
$ herd7 -model ppc.cat stores-sync.litmus
Positive: 0 Negative: 42
Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
Observation StoresSync Never 0 42
Despite my best attempts, I still fail to reproduce this in jcstress:
https://github.com/openjdk/jcstress/blob/master/tests-custom/src/main/java/org/openjdk/jcstress/tests/volatiles/PowerSCViolation.java
C2 dump for that actor3 in that test on ppc64le is consistent with the barrier placement discussed above:
0x00003fff89045e38: sync // leading load barrier
0x00003fff89045e3c: lwz r15,16(r3) // volatile load
0x00003fff89045e40: twi 0,r15,0 // trailing load barrier
0x00003fff89045e44: isync
0x00003fff89045e48: stw r15,16(r4) // record result
0x00003fff89045e4c: lwsync // leading store barrier
0x00003fff89045e50: li r14,1
0x00003fff89045e54: stw r14,12(r3) // volatile store
Consider this test:
volatile int x, y;
Thread 1:
x = 2
r1 = y
Thread 2:
y = 1
Thread 3:
r2 = y
x = 1
Thread 4:
r3 = x
r4 = x
The state (r1,r2,r3,r4) = (0,1,1,2) is forbidden, as it violates sequential consistency. (You can show it by constructing the syncronization order that leads to this result and observing it is cyclic).
Current PPC code is one (the only?) platform that runs into the SC violation with current barrier placement. Current placement seems to be:
C2:
volatile load:
MemBarVolatile // lowered to "sync"
load
MemBarAcquire // lowered to "lwsync"
volatile store:
MemBarRelease // lowered to "lwsync"
store
C1:
volatile load:
MacroAssembler::membar() // lowers to "sync"
load
MacroAssembler::membar_acquire() // lowers to "lwsync"
volatile store:
MacroAssembler::membar_release() // lowers to "lwsync"
store
Note that this placement means that back-to-back volatile load and volatile store only have lwsync between them. This kind of back-to-back volatile load->store is what is happening in Thread 3 in the example above. But, lwsync is insufficient for enforcing sequential consistency (it only orders in message-passing-like acq/rel scenarios, AFAIU).
herd7 test cases show that lwsync or isync+lwsync are indeed insufficient:
https://cr.openjdk.java.net/~shade/8262877/full-barriers.litmus
https://cr.openjdk.java.net/~shade/8262877/relaxed-lwsync.litmus
https://cr.openjdk.java.net/~shade/8262877/relaxed-lwsync-isync.litmus
The test with full barriers everywhere has the outcome forbidden:
$ herd7 -model ppc.cat full-barriers.litmus
Positive: 0 Negative: 42
Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
Observation C1 Never 0 42
...and a single relaxation to lwsync between load and store in P2 breaks it:
$ herd7 -model ppc.cat relaxed-lwsync.litmus
Positive: 1 Negative: 46
Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
Observation C1 Sometimes 1 46
...adding isync does not help either:
$ herd7 -model ppc.cat relaxed-lwsync-isync.litmus
Positive: 1 Negative: 46
Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
Observation C1 Sometimes 1 46
Note that C++ Atomic mappings (https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html) say that:
Load SeqCst: hwsync; ld; cmp; bc; isync
Store SeqCst: hwsync; st
Comparing it with current Hotspot's placement, the barrier leading to a store needs to be "sync", not "lwsync".
This is the C1/C2-like translation of the example to PPC in its current form, and with stronger sync stores:
https://cr.openjdk.java.net/~shade/8262877/stores-lwsync.litmus
https://cr.openjdk.java.net/~shade/8262877/stores-sync.litmus
$ herd7 -model ppc.cat stores-lwsync.litmus
Positive: 1 Negative: 46
Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
Observation StoreLwsync Sometimes 1 46
$ herd7 -model ppc.cat stores-sync.litmus
Positive: 0 Negative: 42
Condition exists (0:r4=0 /\ 2:r4=1 /\ 3:r3=1 /\ 3:r4=2)
Observation StoresSync Never 0 42
Despite my best attempts, I still fail to reproduce this in jcstress:
https://github.com/openjdk/jcstress/blob/master/tests-custom/src/main/java/org/openjdk/jcstress/tests/volatiles/PowerSCViolation.java
C2 dump for that actor3 in that test on ppc64le is consistent with the barrier placement discussed above:
0x00003fff89045e38: sync // leading load barrier
0x00003fff89045e3c: lwz r15,16(r3) // volatile load
0x00003fff89045e40: twi 0,r15,0 // trailing load barrier
0x00003fff89045e44: isync
0x00003fff89045e48: stw r15,16(r4) // record result
0x00003fff89045e4c: lwsync // leading store barrier
0x00003fff89045e50: li r14,1
0x00003fff89045e54: stw r14,12(r3) // volatile store