-
Bug
-
Resolution: Fixed
-
P3
-
6u10
-
b70
-
x86
-
linux
FULL PRODUCT VERSION :
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux 2.6.18-ts15 #3 SMP PREEMPT Thu Jun 12 15:18:57 GMT 2008 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
There appears to be a race condition in ConcurrentLinkedQueue which can cause a single item to be removed twice. The problem is related to an interaction between poll() and remove().
remove() traverses the list of Nodes and returns true if it can CAS a Node's item from non-null to null. If other threads are simultaneously poll()ing during this traversal, it is possible that the remove() thread could fall behind, and end up iterating over nodes that have already been unlinked.
poll() unlinks the head Node of the list, reads the node's item value, sets the reference to null, and then returns the value. However when setting the reference to null, it does not do a CAS, so it is possible that remove() could have set the reference to null after poll() read it but before poll() sets it. If this happens, both functions will believe they have successfully removed the same value from the list.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Apply the given patch to ConcurrentLinkedQueue and then run the associated CLQTest class.
Note that the patch simply causes Thread.sleep() to be caused under certain (contrived) circumstances in order to reliably expose the race condition. It does not affect the correctness of the code.
CLQTest is fairly self-explanatory. It creates a new ConcurrentLinkedQueue, adds two elements to it, starts a thread to remove one of the elements(), and then calls poll() twice.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
First poll returned: A
Second poll returned: null
Async remove("B") returned: true
PASS
ACTUAL -
First poll returned: A
Second poll returned: B
Async remove("B") returned: true
FAIL
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
public class CLQTest
implements Runnable
{
private final ConcurrentLinkedQueue<String> queue =
new ConcurrentLinkedQueue<String>();
private volatile boolean removeWorked;
public CLQTest()
throws InterruptedException
{
queue.add("A");
queue.add("B");
Thread t = new Thread(this);
t.start();
Thread.sleep(1000);
String a = queue.poll();
System.out.println("First poll returned: " + a);
String b = queue.poll();
System.out.println("Second poll returned: " + b);
t.join();
System.out.println("Async remove(\"B\") returned: " + removeWorked);
if (removeWorked ^ "B".equals(b)) {
System.out.println("PASS");
} else {
System.out.println("FAIL");
}
}
public void run() {
removeWorked = queue.remove("B");
}
public static void main(String[] args) throws Exception {
new CLQTest();
}
}
patch file:
--- jdk1.6.0_10_src/java/util/concurrent/ConcurrentLinkedQueue.java 2008-09-26 07:16:03.000000000 +0000
+++ ConcurrentLinkedQueue.java 2008-12-11 22:21:07.390740309 +0000
@@ -218,6 +218,13 @@
casTail(t, first);
} else if (casHead(h, first)) {
E item = first.getItem();
+ if ("B".equals(item)) {
+ try {
+ Thread.sleep(2000);
+ } catch (InterruptedException e) {
+ // ignore
+ }
+ }
if (item != null) {
first.setItem(null);
return item;
@@ -345,6 +352,13 @@
if (o == null) return false;
for (Node<E> p = first(); p != null; p = p.getNext()) {
E item = p.getItem();
+ if ("A".equals(item)) {
+ try {
+ Thread.sleep(2000);
+ } catch (InterruptedException e) {
+ // ignore
+ }
+ }
if (item != null &&
o.equals(item) &&
p.casItem(item, null))
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
@@ -218,8 +218,7 @@
casTail(t, first);
} else if (casHead(h, first)) {
E item = first.getItem();
- if (item != null) {
- first.setItem(null);
+ if ((item != null) && first.casItem(item, null)) {
return item;
}
// else skip over deleted item, continue loop,
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux 2.6.18-ts15 #3 SMP PREEMPT Thu Jun 12 15:18:57 GMT 2008 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
There appears to be a race condition in ConcurrentLinkedQueue which can cause a single item to be removed twice. The problem is related to an interaction between poll() and remove().
remove() traverses the list of Nodes and returns true if it can CAS a Node's item from non-null to null. If other threads are simultaneously poll()ing during this traversal, it is possible that the remove() thread could fall behind, and end up iterating over nodes that have already been unlinked.
poll() unlinks the head Node of the list, reads the node's item value, sets the reference to null, and then returns the value. However when setting the reference to null, it does not do a CAS, so it is possible that remove() could have set the reference to null after poll() read it but before poll() sets it. If this happens, both functions will believe they have successfully removed the same value from the list.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Apply the given patch to ConcurrentLinkedQueue and then run the associated CLQTest class.
Note that the patch simply causes Thread.sleep() to be caused under certain (contrived) circumstances in order to reliably expose the race condition. It does not affect the correctness of the code.
CLQTest is fairly self-explanatory. It creates a new ConcurrentLinkedQueue, adds two elements to it, starts a thread to remove one of the elements(), and then calls poll() twice.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
First poll returned: A
Second poll returned: null
Async remove("B") returned: true
PASS
ACTUAL -
First poll returned: A
Second poll returned: B
Async remove("B") returned: true
FAIL
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
public class CLQTest
implements Runnable
{
private final ConcurrentLinkedQueue<String> queue =
new ConcurrentLinkedQueue<String>();
private volatile boolean removeWorked;
public CLQTest()
throws InterruptedException
{
queue.add("A");
queue.add("B");
Thread t = new Thread(this);
t.start();
Thread.sleep(1000);
String a = queue.poll();
System.out.println("First poll returned: " + a);
String b = queue.poll();
System.out.println("Second poll returned: " + b);
t.join();
System.out.println("Async remove(\"B\") returned: " + removeWorked);
if (removeWorked ^ "B".equals(b)) {
System.out.println("PASS");
} else {
System.out.println("FAIL");
}
}
public void run() {
removeWorked = queue.remove("B");
}
public static void main(String[] args) throws Exception {
new CLQTest();
}
}
patch file:
--- jdk1.6.0_10_src/java/util/concurrent/ConcurrentLinkedQueue.java 2008-09-26 07:16:03.000000000 +0000
+++ ConcurrentLinkedQueue.java 2008-12-11 22:21:07.390740309 +0000
@@ -218,6 +218,13 @@
casTail(t, first);
} else if (casHead(h, first)) {
E item = first.getItem();
+ if ("B".equals(item)) {
+ try {
+ Thread.sleep(2000);
+ } catch (InterruptedException e) {
+ // ignore
+ }
+ }
if (item != null) {
first.setItem(null);
return item;
@@ -345,6 +352,13 @@
if (o == null) return false;
for (Node<E> p = first(); p != null; p = p.getNext()) {
E item = p.getItem();
+ if ("A".equals(item)) {
+ try {
+ Thread.sleep(2000);
+ } catch (InterruptedException e) {
+ // ignore
+ }
+ }
if (item != null &&
o.equals(item) &&
p.casItem(item, null))
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
@@ -218,8 +218,7 @@
casTail(t, first);
} else if (casHead(h, first)) {
E item = first.getItem();
- if (item != null) {
- first.setItem(null);
+ if ((item != null) && first.casItem(item, null)) {
return item;
}
// else skip over deleted item, continue loop,