Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8283717

vmTestbase/nsk/jdi/ThreadStartEvent/thread/thread001 failed due to SocketTimeoutException

XMLWordPrintable

    • b16

        I saw this once in Loom when the test was run with the vthread wrapper:

        Sending command: quit
        IOPipe> Caught InterruptedIOException. Wait start time: 1648259211810, exception was thrown at: 1648259511902, wait time: 300092, actual timeout: 300000
        # ERROR: IOPipe> Caught InterruptedIOException while listening for PipeIO Listener Thread connection at port 36147:
        # ERROR: java.net.SocketTimeoutException: Accept timed out
        The following stacktrace is for failure analysis.
        nsk.share.TestFailure: IOPipe> Caught InterruptedIOException while listening for PipeIO Listener Thread connection at port 36147: java.net.SocketTimeoutException: Accept timed out
        at nsk.share.Log.logExceptionForFailureAnalysis(Log.java:432)
        at nsk.share.Log.complain(Log.java:403)
        at nsk.share.Log$Logger.complain(Log.java:743)
        at nsk.share.jpda.BasicSocketConnection.accept(SocketConnection.java:173)
        at nsk.share.jpda.SocketConnection.accept(SocketConnection.java:453)
        at nsk.share.jpda.SocketIOPipe$ListenerThread.run(SocketIOPipe.java:225)

        Basically the test is nearing completion after passing. It is sending the "quit" command to the debuggee, but that IO operation times out. I think I understand why.

        At the end of the log there is a dump of all debuggee threads that is done before killing the debuggee (a feature of jdi test infra when there is a failure like this). It shows that every thread is suspended. This means that the debuggee can't even receive the "quit" command on the IOPipe it has setup to communicate with the debugger side. I believe the reason why is because just a bit before sending the "quit" the debugger does the following:

                    // disable event request to prevent appearance of further events
                    if (checkedRequest != null) {
                        log.display("Disabling event request");
                        checkedRequest.disable();
                    }

        And then right after this:

                    // force debuggee to quit
                    log.display("Sending command: " + COMMAND_QUIT);
                    pipe.println(COMMAND_QUIT);

        checkedRequest is the ThreadStartRequest that test is using to make sure it gets a ThreadStartEvent for at least the debuggee main and the two threads the debuggee starts up. Once it gets these 3 ThreadStartEvents, it does the above to disable the ThreadStartRequest and terminate the debuggee. The problem is that there can still be other threads created after this point. This is especially true of loom where main carrier threads can be started up once the first vthread is created. As proof, the test only sees ThreadStartEvents for 3 carrier threads, yet the thread dump includes a 4th. Also, a total of 12 carrier threads should eventually be started since the host has 12 cores, so they clearly are still in the process of being started. In passing test cases the test sees all 12 carrier threads being started.

        Since this late ThreadStartEvent comes in after all the expected ThreadStartEvents have been received, it is never processed. Since the SUSPEND_ALL policy is in affect for the ThreadStartRequest, this late ThreadStartEvent causes all the debuggee threads to be suspended, and there is no code to resume them before sending the "quit" command.

        When processing events the test normally does the following to get the next batch of events:

           eventSet = vm.eventQueue().remove();

        and then the the following to make sure all threads are resumed after the events are processed:

           eventSet.resume();

        We could have the test do this before sending "quit", but I think a much simpler solution is probably just to unconditionally call vm.resume().

              cjplummer Chris Plummer
              cjplummer Chris Plummer
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: