The recently integrated fix for bugid 4152799 has caused a regression even
worse than the original bug (since it occurs even without I/O operations
getting interrupted): if an accepted socket connection is on file descriptor
zero (which is possible after stdin has been closed), then a bogus Socket
object (with fd value of -1) will be returned from the ServerSocket.accept()
call, so that subsequent Java operations on the Socket will fail and the
server will think that the socket connection was corrupted, but the VM never
closes the real file descriptor zero, the accepted connection, so the client
who initiated the connection will likely hang waiting for the server (which
to it appears quite alive and holding the connection open) to make a response.
First, reviewing bugid 4152799: when a thread blocked on an accept() call
gets interrupted:
- the sysAccept() function would return -2 (the constant SYS_INTRPT) to
indicate that the call's failure was due to an interrupt, rather than a
normal failure (a return of -1)
[Actually, this only works correctly in green threads: the native threads
version of sysAccept() doesn't handle interrupts properly; it will just return
-1, resulting in a SocketException instead of an InterruptedIOException, but
that should be the subject of a different bug report.]
- The JVM_Accept() function, however, calls sysAccept() through a macro
called CHECK_INTERRUPT(), which checks to see if the called function returned
SYS_INTRPT (-2), and if so, it posts an InterruptedIOException to the current
thread and changes the return value to zero (why??).
- The solaris native code for PlainSocketImpl.socketAccept() would check the
return value of JVM_Accept(), and if it was negative, it would post a
SocketException and return; otherwise, it would assume that the return value
was a valid file descriptor, and it would fill in the fields of the Socket
object provided for the accepted connection, including the return value as
the fd, which would be zero in this case.
- The Java code for ServerSocket.implAccept() catches the InterruptedIOException
thrown in the native code, calls close() on the Socket object for the
(possibly) accepted connection, and then rethrows the IOException. In this
case, this Socket object will contain fd zero, so file descriptor zero will
get closed *whenever* a blocked accept() call is interrupted.
The fix that was integrated to fix bugid 4152799 was to add the following lines
to the solaris native code for PlainSocketImpl.socketAccept() right after it
calls JVM_Accept():
/* Thread was interrupted, return to avoid fd0 being closed */
if (fd == 0)
return;
[The subsequent "if" condition was changed from (fd < 0) to (fd <= 0), which
was pointless to do, and should also be changed back when the above code is
removed.]
I guess that the idea was that if the return value was zero as a result of the
CHECK_INTERRUPT() macro handling an interrupt, then zero won't get stuffed into
the Socket object's fd, so its subsequent close() will cause no harm (operating
on the invalid fd -1).
The crucial point missed by this fix, as well as the bizarre behavior of the
CHECK_INTERRUP() macro in jvm.c, is that zero is a valid file descriptor and
zero is a perfectly valid return value from accept()/sysAccept().
With this currnet "fix", if an accepted socket is on fd zero, the fields of
the returned Socket object will not be filled in, so that fd value will be
left at -1, and any operation on the Socket will fail with a "Bad file number"
SocketException. As far as the OS is concerned, however, the VM will still
be holding fd zero open, so the client who initiated the connection will
likely hang just waiting for a response from the server, since the connection
is kept open. As an example, all of the RMI activation tests fail by hanging
this way, since the child processes of "rmid" close stdin, so connections
will indeed likely be accepted on fd zero.
See "Suggested Fix" for what should be done about this bug.
peter.jones@East 1998-07-23
worse than the original bug (since it occurs even without I/O operations
getting interrupted): if an accepted socket connection is on file descriptor
zero (which is possible after stdin has been closed), then a bogus Socket
object (with fd value of -1) will be returned from the ServerSocket.accept()
call, so that subsequent Java operations on the Socket will fail and the
server will think that the socket connection was corrupted, but the VM never
closes the real file descriptor zero, the accepted connection, so the client
who initiated the connection will likely hang waiting for the server (which
to it appears quite alive and holding the connection open) to make a response.
First, reviewing bugid 4152799: when a thread blocked on an accept() call
gets interrupted:
- the sysAccept() function would return -2 (the constant SYS_INTRPT) to
indicate that the call's failure was due to an interrupt, rather than a
normal failure (a return of -1)
[Actually, this only works correctly in green threads: the native threads
version of sysAccept() doesn't handle interrupts properly; it will just return
-1, resulting in a SocketException instead of an InterruptedIOException, but
that should be the subject of a different bug report.]
- The JVM_Accept() function, however, calls sysAccept() through a macro
called CHECK_INTERRUPT(), which checks to see if the called function returned
SYS_INTRPT (-2), and if so, it posts an InterruptedIOException to the current
thread and changes the return value to zero (why??).
- The solaris native code for PlainSocketImpl.socketAccept() would check the
return value of JVM_Accept(), and if it was negative, it would post a
SocketException and return; otherwise, it would assume that the return value
was a valid file descriptor, and it would fill in the fields of the Socket
object provided for the accepted connection, including the return value as
the fd, which would be zero in this case.
- The Java code for ServerSocket.implAccept() catches the InterruptedIOException
thrown in the native code, calls close() on the Socket object for the
(possibly) accepted connection, and then rethrows the IOException. In this
case, this Socket object will contain fd zero, so file descriptor zero will
get closed *whenever* a blocked accept() call is interrupted.
The fix that was integrated to fix bugid 4152799 was to add the following lines
to the solaris native code for PlainSocketImpl.socketAccept() right after it
calls JVM_Accept():
/* Thread was interrupted, return to avoid fd0 being closed */
if (fd == 0)
return;
[The subsequent "if" condition was changed from (fd < 0) to (fd <= 0), which
was pointless to do, and should also be changed back when the above code is
removed.]
I guess that the idea was that if the return value was zero as a result of the
CHECK_INTERRUPT() macro handling an interrupt, then zero won't get stuffed into
the Socket object's fd, so its subsequent close() will cause no harm (operating
on the invalid fd -1).
The crucial point missed by this fix, as well as the bizarre behavior of the
CHECK_INTERRUP() macro in jvm.c, is that zero is a valid file descriptor and
zero is a perfectly valid return value from accept()/sysAccept().
With this currnet "fix", if an accepted socket is on fd zero, the fields of
the returned Socket object will not be filled in, so that fd value will be
left at -1, and any operation on the Socket will fail with a "Bad file number"
SocketException. As far as the OS is concerned, however, the VM will still
be holding fd zero open, so the client who initiated the connection will
likely hang just waiting for a response from the server, since the connection
is kept open. As an example, all of the RMI activation tests fail by hanging
this way, since the child processes of "rmid" close stdin, so connections
will indeed likely be accepted on fd zero.
See "Suggested Fix" for what should be done about this bug.
peter.jones@East 1998-07-23
- duplicates
-
JDK-4159500 RMI Activation Bug: test client hangs on RMI call to server - no exception
-
- Closed
-
- relates to
-
JDK-4162546 Regression test CheckActivateRef.java failing on JDK1.2fcs-C
-
- Closed
-