Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4159884

fix for 4152799 broken: accepting socket on fd #0 now fails completely

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P2
    • 1.2.0
    • 1.2.0
    • core-libs
    • None
    • 1.2fcs
    • sparc
    • solaris_2.5, solaris_2.6
    • Not verified

    Description

      The recently integrated fix for bugid 4152799 has caused a regression even
      worse than the original bug (since it occurs even without I/O operations
      getting interrupted): if an accepted socket connection is on file descriptor
      zero (which is possible after stdin has been closed), then a bogus Socket
      object (with fd value of -1) will be returned from the ServerSocket.accept()
      call, so that subsequent Java operations on the Socket will fail and the
      server will think that the socket connection was corrupted, but the VM never
      closes the real file descriptor zero, the accepted connection, so the client
      who initiated the connection will likely hang waiting for the server (which
      to it appears quite alive and holding the connection open) to make a response.

      First, reviewing bugid 4152799: when a thread blocked on an accept() call
      gets interrupted:
      - the sysAccept() function would return -2 (the constant SYS_INTRPT) to
        indicate that the call's failure was due to an interrupt, rather than a
        normal failure (a return of -1)
        [Actually, this only works correctly in green threads: the native threads
        version of sysAccept() doesn't handle interrupts properly; it will just return
        -1, resulting in a SocketException instead of an InterruptedIOException, but
        that should be the subject of a different bug report.]
      - The JVM_Accept() function, however, calls sysAccept() through a macro
        called CHECK_INTERRUPT(), which checks to see if the called function returned
        SYS_INTRPT (-2), and if so, it posts an InterruptedIOException to the current
        thread and changes the return value to zero (why??).
      - The solaris native code for PlainSocketImpl.socketAccept() would check the
        return value of JVM_Accept(), and if it was negative, it would post a
        SocketException and return; otherwise, it would assume that the return value
        was a valid file descriptor, and it would fill in the fields of the Socket
        object provided for the accepted connection, including the return value as
        the fd, which would be zero in this case.
      - The Java code for ServerSocket.implAccept() catches the InterruptedIOException
        thrown in the native code, calls close() on the Socket object for the
        (possibly) accepted connection, and then rethrows the IOException. In this
        case, this Socket object will contain fd zero, so file descriptor zero will
        get closed *whenever* a blocked accept() call is interrupted.

      The fix that was integrated to fix bugid 4152799 was to add the following lines
      to the solaris native code for PlainSocketImpl.socketAccept() right after it
      calls JVM_Accept():

          /* Thread was interrupted, return to avoid fd0 being closed */
          if (fd == 0)
              return;

      [The subsequent "if" condition was changed from (fd < 0) to (fd <= 0), which
      was pointless to do, and should also be changed back when the above code is
      removed.]

      I guess that the idea was that if the return value was zero as a result of the
      CHECK_INTERRUPT() macro handling an interrupt, then zero won't get stuffed into
      the Socket object's fd, so its subsequent close() will cause no harm (operating
      on the invalid fd -1).

      The crucial point missed by this fix, as well as the bizarre behavior of the
      CHECK_INTERRUP() macro in jvm.c, is that zero is a valid file descriptor and
      zero is a perfectly valid return value from accept()/sysAccept().

      With this currnet "fix", if an accepted socket is on fd zero, the fields of
      the returned Socket object will not be filled in, so that fd value will be
      left at -1, and any operation on the Socket will fail with a "Bad file number"
      SocketException. As far as the OS is concerned, however, the VM will still
      be holding fd zero open, so the client who initiated the connection will
      likely hang just waiting for a response from the server, since the connection
      is kept open. As an example, all of the RMI activation tests fail by hanging
      this way, since the child processes of "rmid" close stdin, so connections
      will indeed likely be accepted on fd zero.

      See "Suggested Fix" for what should be done about this bug.

      peter.jones@East 1998-07-23

      Attachments

        Issue Links

          Activity

            People

              mmcclosksunw Michael Mccloskey (Inactive)
              peterjones Peter Jones
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: