-
Bug
-
Resolution: Unresolved
-
P4
-
21
-
generic
-
generic
A DESCRIPTION OF THE PROBLEM :
We're experiencing a deadlock on a production system when serving requests using a virtual thread executor. The whole server uses a thread per request model and was written from scratch with virtual threads in mind. As a large software system we transitively depend on 100+ libraries.
When a server deadlocks, the stacktrace will look as follows on all the virtual threads on the system:
java.base/java.lang.VirtualThread.park(VirtualThread.java:592)
java.base/java.lang.System$2.parkVirtualThread(System.java:2639)
java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:54)
java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
java.base/jdk.internal.misc.InternalLock.lock(InternalLock.java:74)
java.base/java.io.PrintStream.writeln(PrintStream.java:824)
java.base/java.io.PrintStream.println(PrintStream.java:1168)
org.slf4j.impl.SimpleLogger.write(SimpleLogger.java:318)
org.slf4j.impl.SimpleLogger.log(SimpleLogger.java:295)
org.slf4j.impl.SimpleLogger.formatAndLog(SimpleLogger.java:355)
org.slf4j.impl.SimpleLogger.debug(SimpleLogger.java:446)
There are also other possible similar stacktraces from other library dependencies. The culprit seems to be any lock taken both inside and outside a synchronized block, so that all platform threads have a virtual thread pinned to them waiting for the lock to be released while the virtual thread holding the lock can't execute.
This basically makes virtual threads unusable in a larger system, because it's impossible to audit every dependency that they don't contain a such a locking protocol. It will take years for transitive dependencies to be updated to a safe version.
We're experiencing a deadlock on a production system when serving requests using a virtual thread executor. The whole server uses a thread per request model and was written from scratch with virtual threads in mind. As a large software system we transitively depend on 100+ libraries.
When a server deadlocks, the stacktrace will look as follows on all the virtual threads on the system:
java.base/java.lang.VirtualThread.park(VirtualThread.java:592)
java.base/java.lang.System$2.parkVirtualThread(System.java:2639)
java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:54)
java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
java.base/jdk.internal.misc.InternalLock.lock(InternalLock.java:74)
java.base/java.io.PrintStream.writeln(PrintStream.java:824)
java.base/java.io.PrintStream.println(PrintStream.java:1168)
org.slf4j.impl.SimpleLogger.write(SimpleLogger.java:318)
org.slf4j.impl.SimpleLogger.log(SimpleLogger.java:295)
org.slf4j.impl.SimpleLogger.formatAndLog(SimpleLogger.java:355)
org.slf4j.impl.SimpleLogger.debug(SimpleLogger.java:446)
There are also other possible similar stacktraces from other library dependencies. The culprit seems to be any lock taken both inside and outside a synchronized block, so that all platform threads have a virtual thread pinned to them waiting for the lock to be released while the virtual thread holding the lock can't execute.
This basically makes virtual threads unusable in a larger system, because it's impossible to audit every dependency that they don't contain a such a locking protocol. It will take years for transitive dependencies to be updated to a safe version.