Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8334304

Virtual threads deadlock when # of pinned virtual threads > availableProcessors()

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • 21, 22, 23
    • core-libs
    • None

      Given the following reduced test (or the even simpler test in https://bugs.openjdk.org/browse/JDK-8334304?focusedId=14681767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14681767 ):
      ```java
      import java.util.Set;
      import java.util.concurrent.*;

      public class LoomDeadlock {
          public static void main(String[] args) throws InterruptedException, ExecutionException {
              int processors = Runtime.getRuntime().availableProcessors();

              ExecutorService executor;
              if (args.length > 0 && args[0].equals("platform")) {
                  executor = Executors.newWorkStealingPool(processors);
              } else {
                  executor = Executors.newVirtualThreadPerTaskExecutor();
              }

              Future<?>[] futures = new Future[processors * 2];
              Phaser phaser = new Phaser(futures.length);
              Set<Thread> threads = ConcurrentHashMap.newKeySet();
              for (int i = 0; i < futures.length; i++) {
                  final Object lock = new Object();
                  futures[i] = executor.submit(() -> {
                      threads.add(Thread.currentThread());
                      synchronized (lock) {
                          phaser.arriveAndAwaitAdvance();
                      }
                      return true;
                  });
              }

              System.out.println("START WAITING");
              try {
                  futures[0].get(5, TimeUnit.SECONDS);
              } catch (TimeoutException t) {
                  var someThread = threads.iterator().next();
                  System.err.println("future did not complete in 5 seconds, printing a thread stacktrace:");
                  System.err.println(someThread);
                  for (var s : someThread.getStackTrace()) {
                      System.err.println(s);
                  }
              }

              for (var thread : futures) {
                  thread.get();
              }
              System.out.println("DONE");
          }
      }
      ```

      When run with the argument "platform" it succeeds with:
      ```
      $ java LoomDeadlock.java platform
      START WAITING
      DONE
      ```

      When run with virtual threads, without arguments, it deadlocks and prints:
      ```
      $ java LoomDeadlock.java
      START WAITING
      future did not complete in 5 seconds, printing a thread stacktrace:
      VirtualThread[#39]/waiting@ForkJoinPool-1-worker-1
      java.base/jdk.internal.misc.Unsafe.park(Native Method)
      java.base/java.lang.VirtualThread.parkOnCarrierThread(VirtualThread.java:661)
      java.base/java.lang.VirtualThread.park(VirtualThread.java:593)
      java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
      java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:54)
      java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
      java.base/java.util.concurrent.Phaser$QNode.block(Phaser.java:1133)
      java.base/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3780)
      java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3725)
      java.base/java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1063)
      java.base/java.util.concurrent.Phaser.arriveAndAwaitAdvance(Phaser.java:685)
      LoomDeadlock.lambda$main$0(LoomDeadlock.java:23)
      java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
      java.base/java.lang.VirtualThread.run(VirtualThread.java:309)
      <deadlocks>
      ```

      This illustrate that Loom / virtual threads can silently deadlock.
      In fact it is even rather difficult to figure out what is going on because for instance SIGQUIT or Thread.getAllStackTraces() will not reveal the problem. One needs to print the stacktrace of a VirtualThread explicitly, and understand that every carrier thread is pinned and in that case Loom deadlocks and does not attempt to compensate.

      This is in contrast to ForkJoinPool which does compensate by adding more threads and runs this fine.

      I think this is particularly unexpected and surprising behavior. The promise of Loom is to be able to run many virtual threads just fine. But in the case N (> processors) virtual threads end up pinned on the carrier thread and waiting for another currently-not-running virtual thread then it deadlocks.

      I used synchronized() to make it easy in this example. However there is the exact same problem if virtual threads do a native call which callbacks into Java instead of the synchronized(). This is actually how I found this issue, where Truffle has guest safepoints which use such native upcalls.

      I can think of these improvements to this situation:
      * compensate by temporarily adding more threads, like ForkJoinPool does it
      * warn the user of the situation so at least they can understand a bit what is going vs seeing their program stuck and having no idea of the actual issue. The warning would need to be shown by default to help.
      * both of the above
      * maybe even throw an error when parking the carrier thread and there is no more carrier threads available?
      * avoid pinning, this seems done in latest Loom builds for synchronized, but is not done for native upcalls and other reasons for pinning a virtual thread (are there any? What's Pinned.CRITICAL_SECTION?). Can some native upcalls be marked as not needing to pin, how? (we would very much need this for Truffle languages, and in general for language implementations on the JVM which support native extensions (so upcalls are frequent)).

            alanb Alan Bateman
            bdaloze Benoit Daloze
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: