Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8191483

AbstractQueuedSynchronizer cancel/cancel race

    XMLWordPrintable

Details

    Description

      FULL PRODUCT VERSION :
      openjdk version "9.0.1"
      OpenJDK Runtime Environment (build 9.0.1+11)
      OpenJDK 64-Bit Server VM (build 9.0.1+11, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      Linux amling 3.2.0-38-generic #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

      A DESCRIPTION OF THE PROBLEM :
      A race of two threads in cancelAcquire() can leak a phantom cancelled node in the wait queue causing hasQueuedPredecessors() to return true incorrectly.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Compile/run sample.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Program runs forever.
      ACTUAL -
      "Broken" line hit. Best guess based on debugger state at this point and reading code is cancel/cancel race.

      Specifically, as of 1.8.0_51:

      *) Two threads are waiting.
      *) Both begin acquireCancel().
      *) The first decides `head` is its predecessor, the second decides the first is its predecessor.
      *) Both mark CANCELED state on their node.
      *) Both check tail-ness. The first decides it is not the tail, the second decides it is the tail.
      *) The second CASes tail to the first's node, then CASes the first's node next to null.
      *) The first CASes nothing (in particular it sees pred == head and skips most of the else block).
      *) Both set their node's next to itself.

      I am able to walk through these steps in the debugger and produce the corrupted state reliably.
      I have not analyzed the code in 9.0.1 but my best effort to reproduce this programmatically (see sample) does reliably fail on 9.0.1.

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      "Broken"

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.util.ArrayList;
      import java.util.List;
      import java.util.concurrent.locks.AbstractQueuedSynchronizer;

      public class Demo {
          private static final class Sync extends AbstractQueuedSynchronizer {
              private static final long serialVersionUID = 1L;

              @Override
              public boolean tryAcquire(int acquires) {
                  if (hasQueuedPredecessors()) {
                      return false;
                  }
                  if (compareAndSetState(0, 1)) {
                      return true;
                  }
                  return false;
              }

              @Override
              protected boolean tryRelease(int releases) {
                  if (compareAndSetState(1, 0)) {
                      return true;
                  }
                  return false;
              }
          }

          private static void testOnce() throws Throwable {
              Sync s = new Sync();

              // hold lock so threads will block
              s.acquire(1);

              // try to trigger double cancel race with two background threads
              List<Thread> ts = new ArrayList<>();
              for (int i = 0; i < 2; ++i) {
                  Thread t = new Thread(() -> {
                      try {
                          s.acquireInterruptibly(1);
                      } catch (Throwable e) {
                          // swallow
                      }
                  });
                  t.start();
                  ts.add(t);
              };
              Thread.sleep(100);
              for (Thread t : ts) {
                  t.interrupt();
              }
              for (Thread t : ts) {
                  t.join();
              }

              // but unlock it for us now
              s.release(1);

              // no one holds lock, we should be able to get it, but sometimes
              // phantom canceled node will make hasQueuedPredecessors() true
              if (!s.tryAcquire(1)) {
                  throw new RuntimeException("Broken");
              }
          }

          public static void main(String[] args) throws Throwable {
              int rep = 0;
              while (true) {
                  long t0 = System.currentTimeMillis();
                  testOnce();
                  long t1 = System.currentTimeMillis();

                  ++rep;
                  System.out.println("Completed rep #" + rep + " in " + (t1 - t0) + " ms");
              }
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Don't use AQS? I vaguely assume most JDK uses of it somehow avoid this case since they're more battle tested, e.g. even a fair ReentrantLock.tryLock() uses a nonfair tryAcquire (and thus bypasses the bug).

      Attachments

        Activity

          People

            martin Martin Buchholz
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: