-
Bug
-
Resolution: Unresolved
-
P3
-
None
-
11.0.6
-
x86_64
-
linux
ADDITIONAL SYSTEM INFORMATION :
A Docker container based on a Ubuntu 18.04 image running on CoreOS
Host: CoreOS 2303.3.0 kernel 4.19.86-coreos arch x86_64
Container: Ubuntu 18.04.4 LTS with NPTL 2.27 and glibc 2.27-3ubuntu1
A DESCRIPTION OF THE PROBLEM :
First observe that a lock is taken on conContext.outputRecord in SSLSocketImpl.duplexCloseOutput(), an SSLSocketOutputRecord. All public SSLSocketOutputRecord methods are all synchronized, so if we're stuck in any SSLSocketOutputRecord method it's not possible to perform a "non-graceful" close of an SSL socket.
This is a particular problem in the event a network event triggers TCP retransmits and the (TCP) socket write() path gets backed up. Certain HTTP client libraries using blocking socket IO (Apache HttpClient) will handle request processing on a background thread and effectively wait on a future in the foreground for a configured timeout period. If the timeout expires before the request completes, the client library will then close() the socket to force the connection to terminate and recover from whatever network funk the socket found itself in.
Unfortunately this doesn't work in Java 11: the SSLSocketOutputRecord.deliver() method gets stuck in write() due to the network funk. Because the method is synchronized, it holds the conContext.outputRecord lock long before SSLSocketImpl.duplexCloseOutput() is able to acquire it.
I think the exact impact will depend on the way in which the network issues manifest, but in my case this caused regular, large (15-16 minute) periods of time blocked in close() by the HTTP client library, far longer than the configured 5 second timeout.
Looking at the code I think there are probably similar potential issues during SSL handshakes etc., but the deliver() scenario is the one we've observed in the wild.
We first observed this issue upgrading from Java 8 to Java 11. Downgrading to Java 8 made the problem go away.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. establish a connection between two hosts/VMs, have the client side perform sizable writes (enough to fill up socket buffers etc.), the server just reads and discards.
2. introduce a null route on either side (or otherwise prevent transmission of TCP acks from the server to the client) force the client to attempt retransmits
3. wait until you're stuck in a write() (check stack dumps), then call close() on the client-side socket.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The socket file descriptor should close non-gracefully/"prematurely", forcing the write to terminate immediately.
ACTUAL -
close() blocks until the OS forces the socket closed at the transport layer, causing the socket write to fail
FREQUENCY : often
A Docker container based on a Ubuntu 18.04 image running on CoreOS
Host: CoreOS 2303.3.0 kernel 4.19.86-coreos arch x86_64
Container: Ubuntu 18.04.4 LTS with NPTL 2.27 and glibc 2.27-3ubuntu1
A DESCRIPTION OF THE PROBLEM :
First observe that a lock is taken on conContext.outputRecord in SSLSocketImpl.duplexCloseOutput(), an SSLSocketOutputRecord. All public SSLSocketOutputRecord methods are all synchronized, so if we're stuck in any SSLSocketOutputRecord method it's not possible to perform a "non-graceful" close of an SSL socket.
This is a particular problem in the event a network event triggers TCP retransmits and the (TCP) socket write() path gets backed up. Certain HTTP client libraries using blocking socket IO (Apache HttpClient) will handle request processing on a background thread and effectively wait on a future in the foreground for a configured timeout period. If the timeout expires before the request completes, the client library will then close() the socket to force the connection to terminate and recover from whatever network funk the socket found itself in.
Unfortunately this doesn't work in Java 11: the SSLSocketOutputRecord.deliver() method gets stuck in write() due to the network funk. Because the method is synchronized, it holds the conContext.outputRecord lock long before SSLSocketImpl.duplexCloseOutput() is able to acquire it.
I think the exact impact will depend on the way in which the network issues manifest, but in my case this caused regular, large (15-16 minute) periods of time blocked in close() by the HTTP client library, far longer than the configured 5 second timeout.
Looking at the code I think there are probably similar potential issues during SSL handshakes etc., but the deliver() scenario is the one we've observed in the wild.
We first observed this issue upgrading from Java 8 to Java 11. Downgrading to Java 8 made the problem go away.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. establish a connection between two hosts/VMs, have the client side perform sizable writes (enough to fill up socket buffers etc.), the server just reads and discards.
2. introduce a null route on either side (or otherwise prevent transmission of TCP acks from the server to the client) force the client to attempt retransmits
3. wait until you're stuck in a write() (check stack dumps), then call close() on the client-side socket.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The socket file descriptor should close non-gracefully/"prematurely", forcing the write to terminate immediately.
ACTUAL -
close() blocks until the OS forces the socket closed at the transport layer, causing the socket write to fail
FREQUENCY : often
- duplicates
-
JDK-8293921 SSLSocket.close waits for peer response
- Open