-
Bug
-
Resolution: Fixed
-
P3
-
8, 9.0.1, 10
-
b01
-
x86_64
-
linux
FULL PRODUCT VERSION :
openjdk version "9.0.1"
OpenJDK Runtime Environment (build 9.0.1+11)
OpenJDK 64-Bit Server VM (build 9.0.1+11, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux amling 3.2.0-38-generic #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
A race of two threads in cancelAcquire() can leak a phantom cancelled node in the wait queue causing hasQueuedPredecessors() to return true incorrectly.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile/run sample.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Program runs forever.
ACTUAL -
"Broken" line hit. Best guess based on debugger state at this point and reading code is cancel/cancel race.
Specifically, as of 1.8.0_51:
*) Two threads are waiting.
*) Both begin acquireCancel().
*) The first decides `head` is its predecessor, the second decides the first is its predecessor.
*) Both mark CANCELED state on their node.
*) Both check tail-ness. The first decides it is not the tail, the second decides it is the tail.
*) The second CASes tail to the first's node, then CASes the first's node next to null.
*) The first CASes nothing (in particular it sees pred == head and skips most of the else block).
*) Both set their node's next to itself.
I am able to walk through these steps in the debugger and produce the corrupted state reliably.
I have not analyzed the code in 9.0.1 but my best effort to reproduce this programmatically (see sample) does reliably fail on 9.0.1.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
"Broken"
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.locks.AbstractQueuedSynchronizer;
public class Demo {
private static final class Sync extends AbstractQueuedSynchronizer {
private static final long serialVersionUID = 1L;
@Override
public boolean tryAcquire(int acquires) {
if (hasQueuedPredecessors()) {
return false;
}
if (compareAndSetState(0, 1)) {
return true;
}
return false;
}
@Override
protected boolean tryRelease(int releases) {
if (compareAndSetState(1, 0)) {
return true;
}
return false;
}
}
private static void testOnce() throws Throwable {
Sync s = new Sync();
// hold lock so threads will block
s.acquire(1);
// try to trigger double cancel race with two background threads
List<Thread> ts = new ArrayList<>();
for (int i = 0; i < 2; ++i) {
Thread t = new Thread(() -> {
try {
s.acquireInterruptibly(1);
} catch (Throwable e) {
// swallow
}
});
t.start();
ts.add(t);
};
Thread.sleep(100);
for (Thread t : ts) {
t.interrupt();
}
for (Thread t : ts) {
t.join();
}
// but unlock it for us now
s.release(1);
// no one holds lock, we should be able to get it, but sometimes
// phantom canceled node will make hasQueuedPredecessors() true
if (!s.tryAcquire(1)) {
throw new RuntimeException("Broken");
}
}
public static void main(String[] args) throws Throwable {
int rep = 0;
while (true) {
long t0 = System.currentTimeMillis();
testOnce();
long t1 = System.currentTimeMillis();
++rep;
System.out.println("Completed rep #" + rep + " in " + (t1 - t0) + " ms");
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Don't use AQS? I vaguely assume most JDK uses of it somehow avoid this case since they're more battle tested, e.g. even a fair ReentrantLock.tryLock() uses a nonfair tryAcquire (and thus bypasses the bug).
openjdk version "9.0.1"
OpenJDK Runtime Environment (build 9.0.1+11)
OpenJDK 64-Bit Server VM (build 9.0.1+11, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux amling 3.2.0-38-generic #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
A race of two threads in cancelAcquire() can leak a phantom cancelled node in the wait queue causing hasQueuedPredecessors() to return true incorrectly.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile/run sample.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Program runs forever.
ACTUAL -
"Broken" line hit. Best guess based on debugger state at this point and reading code is cancel/cancel race.
Specifically, as of 1.8.0_51:
*) Two threads are waiting.
*) Both begin acquireCancel().
*) The first decides `head` is its predecessor, the second decides the first is its predecessor.
*) Both mark CANCELED state on their node.
*) Both check tail-ness. The first decides it is not the tail, the second decides it is the tail.
*) The second CASes tail to the first's node, then CASes the first's node next to null.
*) The first CASes nothing (in particular it sees pred == head and skips most of the else block).
*) Both set their node's next to itself.
I am able to walk through these steps in the debugger and produce the corrupted state reliably.
I have not analyzed the code in 9.0.1 but my best effort to reproduce this programmatically (see sample) does reliably fail on 9.0.1.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
"Broken"
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.locks.AbstractQueuedSynchronizer;
public class Demo {
private static final class Sync extends AbstractQueuedSynchronizer {
private static final long serialVersionUID = 1L;
@Override
public boolean tryAcquire(int acquires) {
if (hasQueuedPredecessors()) {
return false;
}
if (compareAndSetState(0, 1)) {
return true;
}
return false;
}
@Override
protected boolean tryRelease(int releases) {
if (compareAndSetState(1, 0)) {
return true;
}
return false;
}
}
private static void testOnce() throws Throwable {
Sync s = new Sync();
// hold lock so threads will block
s.acquire(1);
// try to trigger double cancel race with two background threads
List<Thread> ts = new ArrayList<>();
for (int i = 0; i < 2; ++i) {
Thread t = new Thread(() -> {
try {
s.acquireInterruptibly(1);
} catch (Throwable e) {
// swallow
}
});
t.start();
ts.add(t);
};
Thread.sleep(100);
for (Thread t : ts) {
t.interrupt();
}
for (Thread t : ts) {
t.join();
}
// but unlock it for us now
s.release(1);
// no one holds lock, we should be able to get it, but sometimes
// phantom canceled node will make hasQueuedPredecessors() true
if (!s.tryAcquire(1)) {
throw new RuntimeException("Broken");
}
}
public static void main(String[] args) throws Throwable {
int rep = 0;
while (true) {
long t0 = System.currentTimeMillis();
testOnce();
long t1 = System.currentTimeMillis();
++rep;
System.out.println("Completed rep #" + rep + " in " + (t1 - t0) + " ms");
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Don't use AQS? I vaguely assume most JDK uses of it somehow avoid this case since they're more battle tested, e.g. even a fair ReentrantLock.tryLock() uses a nonfair tryAcquire (and thus bypasses the bug).