-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
P4
-
Affects Version/s: 26
-
Component/s: core-libs
-
In Review
An experimental change to SelectorManager::shutdown unveil a potential deadlock between the SelectorManager thread trying to stop the HttpClientImpl, and an executor thread concurrently trying to return a connection to the pool.
The issue seems to be caused by the ConnectionPool::returnToPool trying to close the returned connection when stopping, while holding the ConnectionPool state lock, and the SelectorManager thread trying to close a pooled connection, holding the connection stateLock and trying close the channel, which caused the CleanupTrigger to fire and attempt to remove the connection from the pool.
The change that cause the deadlock to be observable was:
```
--- a/src/java.net.http/share/classes/jdk/internal/net/http/HttpClientImpl.java
+++ b/src/java.net.http/share/classes/jdk/internal/net/http/HttpClientImpl.java
@@ -1333,6 +1333,13 @@ String getName() {
// Only called by the selector manager thread
private void shutdown() {
+ // first stop the client to avoid seeing exceptions
+ // about "selector manager closed"
+ Log.logTrace("{0}: stopping", owner.dbgTag);
+ try {
+ owner.stop();
+ } catch (Throwable ignored) {
+ }
try {
lock.lock();
try {
@@ -1345,6 +1352,7 @@ private void shutdown() {
}
} catch (IOException ignored) {
} finally {
+ // cleanup anything that might have been left behind
owner.stop();
}
}
```
Here is an extract of the log showing the deadlock:
The issue occurred with https://github.com/openjdk/jdk/pull/28395/commits/a311e70806b7b057712d7d6d3c9f5cc934d24455, which changed SelectorManager::shutdown to call owner::stop before closing the selector in order to avoid printing stray "selector manager closed" stack traces when closing connections after the selector has been closed. Unfortunately - this seems to be causing a deadlock involving pool-1-thread-2 and HttpClient-15-SelectorManager: this problem was observed once with the java/net/httpclient/ThrowingSubscribersAsLimitingAsync.java test - when running with the changes above. The test failed because one client (HttpClient-15) had one TCP connection still opened and its SelectorManager thread (HttpClient-15-SelectorManager) was still alive at the end of the test However, because HttpClient-15-SelectorManager is virtual we only have half of the picture and must guess where HttpClient-15-SelectorManager is blocked. The suspicion is that it might be trying to remove some connection from the pool:
WARNING: tracker for HttpClientImpl(15) has outstanding operations:
Pending HTTP Requests: 0
Pending HTTP/1.1 operations: 0
Pending HTTP/2 streams: 0
Pending HTTP/3 streams: 0
Pending WebSocket operations: 0
Pending TCP connections: 1
Pending Subscribers: 0
Total pending operations: 0
Facade referenced: false
Selector alive: true
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] Http2ClientImpl stopping
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] PlainHttpConnection(SocketTube(tcp:15)) Closing channel: channel registered with selector, key.interestOps=R, sa.interestOps=R
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SocketTube(tcp:15) signal read error: java.io.IOException: connection closed locally
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SocketTube(tcp:15) got read error: java.io.IOException: connection closed locally
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SocketTube(tcp:15) pausing read event
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SelectorAttachment Registering jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent@8e8f8b1 for None (false)
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SocketTube(tcp:15) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(tcp:15))
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SocketTube(tcp:15) forwarding error to subscriber: java.io.IOException: connection closed locally
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] ConnectionPool(15) Cleanup triggered for CleanupTrigger(SocketTube(tcp:15)): pendingData:0 error:java.io.IOException: connection closed locally
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] ConnectionPool(15) SocketTube(tcp:15) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
"pool-1-thread-2" prio=5 Id=51 WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@6d5b1326 owned by "HttpClient-15-SelectorManager" Id=128
at java.base@26-internal/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@6d5b1326
at java.base@26-internal/java.util.concurrent.locks.LockSupport.park(LockSupport.java:223)
at java.base@26-internal/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:790)
at java.base@26-internal/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1030)
at java.base@26-internal/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:154)
at java.base@26-internal/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:323)
at platform/java.net.http@26-internal/jdk.internal.net.http.PlainHttpConnection.close(PlainHttpConnection.java:297)
at platform/java.net.http@26-internal/jdk.internal.net.http.PlainHttpConnection.close(PlainHttpConnection.java:290)
at platform/java.net.http@26-internal/jdk.internal.net.http.ConnectionPool.returnToPool(ConnectionPool.java:213)
at platform/java.net.http@26-internal/jdk.internal.net.http.ConnectionPool.returnToPool(ConnectionPool.java:190)
at platform/java.net.http@26-internal/jdk.internal.net.http.HttpConnection.closeOrReturnToCache(HttpConnection.java:503)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response.onFinished(Http1Response.java:437)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response$$Lambda/0x00000ffe0126e590.run(Unknown Source)
at platform/java.net.http@26-internal/jdk.internal.net.http.ResponseContent$FixedLengthBodyParser.accept(ResponseContent.java:628)
at platform/java.net.http@26-internal/jdk.internal.net.http.ResponseContent$FixedLengthBodyParser.accept(ResponseContent.java:540)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response$BodyReader.handle(Http1Response.java:708)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response$BodyReader.handle(Http1Response.java:636)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response$Receiver.accept(Http1Response.java:528)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response$BodyReader.tryAsyncReceive(Http1Response.java:666)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:233)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1AsyncReceiver$$Lambda/0x00000ffe0125fa50.run(Unknown Source)
at platform/java.net.http@26-internal/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182)
at platform/java.net.http@26-internal/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at platform/java.net.http@26-internal/jdk.internal.net.http.common.SequentialScheduler$TryEndDeferredCompleter.complete(SequentialScheduler.java:324)
at platform/java.net.http@26-internal/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:151)
at platform/java.net.http@26-internal/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207)
at app//AbstractThrowingSubscribers$TestExecutor.lambda$execute$0(AbstractThrowingSubscribers.java:134)
at app//AbstractThrowingSubscribers$TestExecutor$$Lambda/0x00000ffe0110cf50.run(Unknown Source)
at java.base@26-internal/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
at java.base@26-internal/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
at java.base@26-internal/java.lang.Thread.runWith(Thread.java:1529)
at java.base@26-internal/java.lang.Thread.run(Thread.java:1516)
The issue seems to be caused by the ConnectionPool::returnToPool trying to close the returned connection when stopping, while holding the ConnectionPool state lock, and the SelectorManager thread trying to close a pooled connection, holding the connection stateLock and trying close the channel, which caused the CleanupTrigger to fire and attempt to remove the connection from the pool.
The change that cause the deadlock to be observable was:
```
--- a/src/java.net.http/share/classes/jdk/internal/net/http/HttpClientImpl.java
+++ b/src/java.net.http/share/classes/jdk/internal/net/http/HttpClientImpl.java
@@ -1333,6 +1333,13 @@ String getName() {
// Only called by the selector manager thread
private void shutdown() {
+ // first stop the client to avoid seeing exceptions
+ // about "selector manager closed"
+ Log.logTrace("{0}: stopping", owner.dbgTag);
+ try {
+ owner.stop();
+ } catch (Throwable ignored) {
+ }
try {
lock.lock();
try {
@@ -1345,6 +1352,7 @@ private void shutdown() {
}
} catch (IOException ignored) {
} finally {
+ // cleanup anything that might have been left behind
owner.stop();
}
}
```
Here is an extract of the log showing the deadlock:
The issue occurred with https://github.com/openjdk/jdk/pull/28395/commits/a311e70806b7b057712d7d6d3c9f5cc934d24455, which changed SelectorManager::shutdown to call owner::stop before closing the selector in order to avoid printing stray "selector manager closed" stack traces when closing connections after the selector has been closed. Unfortunately - this seems to be causing a deadlock involving pool-1-thread-2 and HttpClient-15-SelectorManager: this problem was observed once with the java/net/httpclient/ThrowingSubscribersAsLimitingAsync.java test - when running with the changes above. The test failed because one client (HttpClient-15) had one TCP connection still opened and its SelectorManager thread (HttpClient-15-SelectorManager) was still alive at the end of the test However, because HttpClient-15-SelectorManager is virtual we only have half of the picture and must guess where HttpClient-15-SelectorManager is blocked. The suspicion is that it might be trying to remove some connection from the pool:
WARNING: tracker for HttpClientImpl(15) has outstanding operations:
Pending HTTP Requests: 0
Pending HTTP/1.1 operations: 0
Pending HTTP/2 streams: 0
Pending HTTP/3 streams: 0
Pending WebSocket operations: 0
Pending TCP connections: 1
Pending Subscribers: 0
Total pending operations: 0
Facade referenced: false
Selector alive: true
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] Http2ClientImpl stopping
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] PlainHttpConnection(SocketTube(tcp:15)) Closing channel: channel registered with selector, key.interestOps=R, sa.interestOps=R
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SocketTube(tcp:15) signal read error: java.io.IOException: connection closed locally
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SocketTube(tcp:15) got read error: java.io.IOException: connection closed locally
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SocketTube(tcp:15) pausing read event
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SelectorAttachment Registering jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent@8e8f8b1 for None (false)
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SocketTube(tcp:15) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(tcp:15))
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] SocketTube(tcp:15) forwarding error to subscriber: java.io.IOException: connection closed locally
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] ConnectionPool(15) Cleanup triggered for CleanupTrigger(SocketTube(tcp:15)): pendingData:0 error:java.io.IOException: connection closed locally
DEBUG: [HttpClient-15-SelectorManager] [2s 137ms] ConnectionPool(15) SocketTube(tcp:15) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
"pool-1-thread-2" prio=5 Id=51 WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@6d5b1326 owned by "HttpClient-15-SelectorManager" Id=128
at java.base@26-internal/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@6d5b1326
at java.base@26-internal/java.util.concurrent.locks.LockSupport.park(LockSupport.java:223)
at java.base@26-internal/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:790)
at java.base@26-internal/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1030)
at java.base@26-internal/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:154)
at java.base@26-internal/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:323)
at platform/java.net.http@26-internal/jdk.internal.net.http.PlainHttpConnection.close(PlainHttpConnection.java:297)
at platform/java.net.http@26-internal/jdk.internal.net.http.PlainHttpConnection.close(PlainHttpConnection.java:290)
at platform/java.net.http@26-internal/jdk.internal.net.http.ConnectionPool.returnToPool(ConnectionPool.java:213)
at platform/java.net.http@26-internal/jdk.internal.net.http.ConnectionPool.returnToPool(ConnectionPool.java:190)
at platform/java.net.http@26-internal/jdk.internal.net.http.HttpConnection.closeOrReturnToCache(HttpConnection.java:503)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response.onFinished(Http1Response.java:437)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response$$Lambda/0x00000ffe0126e590.run(Unknown Source)
at platform/java.net.http@26-internal/jdk.internal.net.http.ResponseContent$FixedLengthBodyParser.accept(ResponseContent.java:628)
at platform/java.net.http@26-internal/jdk.internal.net.http.ResponseContent$FixedLengthBodyParser.accept(ResponseContent.java:540)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response$BodyReader.handle(Http1Response.java:708)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response$BodyReader.handle(Http1Response.java:636)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response$Receiver.accept(Http1Response.java:528)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1Response$BodyReader.tryAsyncReceive(Http1Response.java:666)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:233)
at platform/java.net.http@26-internal/jdk.internal.net.http.Http1AsyncReceiver$$Lambda/0x00000ffe0125fa50.run(Unknown Source)
at platform/java.net.http@26-internal/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182)
at platform/java.net.http@26-internal/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at platform/java.net.http@26-internal/jdk.internal.net.http.common.SequentialScheduler$TryEndDeferredCompleter.complete(SequentialScheduler.java:324)
at platform/java.net.http@26-internal/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:151)
at platform/java.net.http@26-internal/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207)
at app//AbstractThrowingSubscribers$TestExecutor.lambda$execute$0(AbstractThrowingSubscribers.java:134)
at app//AbstractThrowingSubscribers$TestExecutor$$Lambda/0x00000ffe0110cf50.run(Unknown Source)
at java.base@26-internal/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
at java.base@26-internal/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
at java.base@26-internal/java.lang.Thread.runWith(Thread.java:1529)
at java.base@26-internal/java.lang.Thread.run(Thread.java:1516)
- relates to
-
JDK-8372191 ReferenceTracker::printThreads only print platform threads
-
- Open
-
-
JDK-8372159 HttpClient SelectorManager thread could be a VirtualThread
-
- In Progress
-
- links to
-
Review(pr/28395)
openjdk/jdk/28430