Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8369227

Thread.join() blocks despite the joined thread being done

XMLWordPrintable

    • x86_64
    • os_x

      ADDITIONAL SYSTEM INFORMATION :
      openjdk version "25" 2025-09-16
      OpenJDK Runtime Environment (build 25+36-3489)
      OpenJDK 64-Bit Server VM (build 25+36-3489, mixed mode, sharing)

      Tested also on JVM 21, 25-temurin, 25-zulu

      OS: Darwin Kernel Version 25.0.0: Wed Sep 17 21:41:45 PDT 2025; root:xnu-12377.1.9~141/RELEASE_ARM64_T6000 arm64

      A DESCRIPTION OF THE PROBLEM :
      The issue occurs rarely when running a test suite for a library making heave use of Virtual Threads. The specific tests repeatedly creates a large number of threads, interrupts & joins them.

      I shared a the test (*), on a branch which additionally logs when the problematic thread is started, finished and joined. The `StressTest.testMultipleOperationsDirect` test is run in a loop, sometimes it hangs after a couple of minutes, and sometimes it just goes on without problems. If it hangs, it always hangs in the same place. There are no exceptions logged or caught. I tried simplifying the code and creating a smaller reproducible test-case, but unfortunately removing any of the code (assertions, branches etc.) causes the test to stop hanging.

      When the test hangs, `jcmd [pid] Thread.dump_to_file [file]` produces a stack trace with the following virtual threads running:

      #49394 "" virtual WAITING 2025-10-02T08:14:38.821216Z
          at java.base/java.lang.VirtualThread.park(VirtualThread.java:738)
          - parking to wait for <java.util.concurrent.CompletableFuture$Signaller@38994429>
          at java.base/java.lang.System$1.parkVirtualThread(System.java:2284)
          at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
          at java.base/java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1885)
          at java.base/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:4364)
          at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:4310)
          at java.base/java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1919)
          at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2093)
          at com.softwaremill.jox/com.softwaremill.jox.StressTest.lambda$testAndVerify$0(StressTest.java:97)
          at com.softwaremill.jox/com.softwaremill.jox.TestUtil.lambda$scoped$0(TestUtil.java:17)
          at java.base/java.lang.VirtualThread.run(VirtualThread.java:456)

      #49401 "" virtual WAITING 2025-10-02T08:14:38.821305Z
          at java.base/java.lang.VirtualThread.park(VirtualThread.java:738)
          - parking to wait for <java.util.concurrent.CountDownLatch$Sync@6cfa0497>
          at java.base/java.lang.System$1.parkVirtualThread(System.java:2284)
          at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
          at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:790)
          at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1139)
          at java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
          at java.base/java.lang.VirtualThread.joinNanos(VirtualThread.java:1002)
          at java.base/java.lang.Thread.join(Thread.java:1870)
          at java.base/java.lang.Thread.join(Thread.java:1963)
          at com.softwaremill.jox/com.softwaremill.jox.TestUtil$1.cancel(TestUtil.java:101)
          at com.softwaremill.jox/com.softwaremill.jox.StressTest.stressTestIteration(StressTest.java:234)
          at com.softwaremill.jox/com.softwaremill.jox.StressTest.lambda$testAndVerify$1(StressTest.java:79)
          at com.softwaremill.jox/com.softwaremill.jox.TestUtil.lambda$fork$0(TestUtil.java:36)
          at java.base/java.lang.VirtualThread.run(VirtualThread.java:456)

      The first is waiting for the test completion, the second is interesting: here it's thread number 49401 waiting on a `Thread.join()` called from `TestUtil$1.cancel(TestUtil.java:101)`.

      If you take a look at the source (+), cancelling amounts to an interrupt and join. There's also a `Thread.yield()` before calling cancel.

      From the thread dump, we know that it's thread 49401 that is hanging. Looking at the system output (the debug messages), that thread has been joining on a number of threads, each has a "joining"-"joined" pair of messages (as printed out in the source), except the last one:

      Starting 50097 <- a worker thread
      (...)
      Joining on thread: 50097 from 49401 <- hanging thread
      (...)
      Exit 50097 <- printed in finally of the worker thread

      However, despite thread 50097 exiting, the `join()` never completes. In various runs, sometimes the thread seems to finish before the join is even invoked (but this might timing of the `System.out.println`). Sometimes it's far apart (with many other messages in-between), sometimes the three messages are clustered together. The thread dump does not contain thread 50097.


            alanb Alan Bateman
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: