Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8018709 | 7u45 | Shirish Kuncolienkar | P3 | Closed | Fixed | b01 |
JDK-8004167 | 7u40 | Alan Bateman | P3 | Closed | Fixed | b06 |
SYNOPSIS
--------
Race conditions in NIO Selector code
OPERATING SYSTEMS
----------------
All (discovered on Windows)
FULL JDK VERSIONS
-----------------
All (discovered on Java 6).
Not tested on JDK 7.
PROBLEM DESCRIPTION from LICENSEE
---------------------------------
We have identified a narrow timing window in AbstractSelectableChannel.implCloseChannel() where one of the channel/fds can be closed while a selector.select() operation is in
progress. According to the current implementation, the channel will be closed first by going to native level and then come back, acquire the keylock and cancel the associated key with this channel. With this approach there is (incorrectly) always a timing window between closing the channel and cancelling the corresponding key. The window increases when there is a delay to acquire the keylock due to lock contention.
We are only able to reproduce during stress testing of one of our large products - we do not have a small standalone testcase. However, the problem can be simulated by modifying the AbstractSelectableChannel class such that there is a sleep between the calls to implCloseSelectableChannel() and the grabbing of the keyLock ("synchronized (keyLock)") in AbstractSelectableChannel.implCloseChannel().
The problem can be fixed if we change the implementation such that the keys are cancelled first, before closing the corresponding channel.
We also identified another small timing window between registering the key and closing the channel, caused by the fact that these two operations acquire different locks (regLock and keyLock). While closing the channel the JDK has to ensure that no more keys are registered by the selector for that channel. The problem can be avoided if AbstractSelectableChannel.implCloseChannel() acquires reglock as well as keyLock. We have conducted performance benchmark tests with the proposed fix in place and we see no noticeable performance degradation.
SUGGESTED FIX
-------------
The following diff is based on "6u27-b05/j2se/src/share/classes/java/nio/channels/spi/AbstractSelectableChannel.java"
165a166
> synchronized (regLock) {
170c171
< synchronized (regLock) {
---
>
201,202c202,203
< implCloseSelectableChannel();
< synchronized (keyLock) {
---
> synchronized (regLock) {
> synchronized (keyLock) {
209c210,212
< }
---
> }
> implCloseSelectableChannel();
> }
--------
Race conditions in NIO Selector code
OPERATING SYSTEMS
----------------
All (discovered on Windows)
FULL JDK VERSIONS
-----------------
All (discovered on Java 6).
Not tested on JDK 7.
PROBLEM DESCRIPTION from LICENSEE
---------------------------------
We have identified a narrow timing window in AbstractSelectableChannel.implCloseChannel() where one of the channel/fds can be closed while a selector.select() operation is in
progress. According to the current implementation, the channel will be closed first by going to native level and then come back, acquire the keylock and cancel the associated key with this channel. With this approach there is (incorrectly) always a timing window between closing the channel and cancelling the corresponding key. The window increases when there is a delay to acquire the keylock due to lock contention.
We are only able to reproduce during stress testing of one of our large products - we do not have a small standalone testcase. However, the problem can be simulated by modifying the AbstractSelectableChannel class such that there is a sleep between the calls to implCloseSelectableChannel() and the grabbing of the keyLock ("synchronized (keyLock)") in AbstractSelectableChannel.implCloseChannel().
The problem can be fixed if we change the implementation such that the keys are cancelled first, before closing the corresponding channel.
We also identified another small timing window between registering the key and closing the channel, caused by the fact that these two operations acquire different locks (regLock and keyLock). While closing the channel the JDK has to ensure that no more keys are registered by the selector for that channel. The problem can be avoided if AbstractSelectableChannel.implCloseChannel() acquires reglock as well as keyLock. We have conducted performance benchmark tests with the proposed fix in place and we see no noticeable performance degradation.
SUGGESTED FIX
-------------
The following diff is based on "6u27-b05/j2se/src/share/classes/java/nio/channels/spi/AbstractSelectableChannel.java"
165a166
> synchronized (regLock) {
170c171
< synchronized (regLock) {
---
>
201,202c202,203
< implCloseSelectableChannel();
< synchronized (keyLock) {
---
> synchronized (regLock) {
> synchronized (keyLock) {
209c210,212
< }
---
> }
> implCloseSelectableChannel();
> }
- backported by
-
JDK-8004167 (se) AbstractSelectableChannel.register and configureBlocking not safe from asynchronous close
-
- Closed
-
-
JDK-8018709 (se) AbstractSelectableChannel.register and configureBlocking not safe from asynchronous close
-
- Closed
-
- duplicates
-
JDK-7192849 Potential race condition in AbstractSelectableChannel
-
- Closed
-