Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8182757

JDWP: Socket Transport handshake hangs on Solaris

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P2 P2
    • 10
    • 6, 7, 8, 9, 10
    • core-svc
    • b19
    • sparc, x86_64
    • solaris

      Setting priority to P2 to match the original bug:

      JDK-6303969 JDWP: Socket Transport handshake fails rarely on InstancesTest.java

      The purpose of this new bug is to extract sighting information for ONE
      failure mode described in JDK-6303969.

      The failure mode is a hang between the debugger and debuggee
      on Solaris SPARC or Solaris X64 systems only. This failure mode
      has not been seen on any other platform.

      The debugger pstack trace looks like this:

      ----------------- lwp# 2 / thread# 2 --------------------
       ffff80ffbf51e35a pollsys (ffff80ffbf13e418, 1, 0, 0)
       ffff80ffbf4bef93 poll () + 5f
       ffff80f1b8814ad8 NET_Timeout0 () + b8
       ffff80f1b881428a NET_Timeout () + 2a
       ffff80f1b8810fa7 Java_java_net_PlainSocketImpl_socketAccept () + 247
       ffff80ffa2030f31 * java/net/PlainSocketImpl.socketAccept(Ljava/net/SocketImpl;)V+-30776
       ffff80ffa200b7e3 * java/net/AbstractPlainSocketImpl.accept(Ljava/net/SocketImpl;)V+23568 (line 922)
       ffff80ffa200b7e3 * java/net/ServerSocket.implAccept(Ljava/net/Socket;)V+3232 (line 1155)
       ffff80ffa200b7e3 * java/net/ServerSocket.accept()Ljava/net/Socket;+3584 (line 1034)
       ffff80ffa200b560 * nsk/share/jdwp/SocketTransport.accept()V+7768 (line 209)
       ffff80ffa200b7e3 * nsk/share/jdwp/Debugee.connect()Lnsk/share/jdwp/Transport;+-7592 (line 301)
       ffff80ffa200b560 * nsk/share/jdwp/Binder.bindToDebugee(Ljava/lang/String;)Lnsk/share/jdwp/Debugee;+9296 (line 185)
       ffff80ffa200b560 * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.runIt([Ljava/lang/String;Ljava/io/PrintStream;)I+-15856 (line 354)
       ffff80ffa200b220 * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.run([Ljava/lang/String;Ljava/io/PrintStream;)I+-15576 (line 188)
       ffff80ffa200b220 * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.main([Ljava/lang/String;)V+-15304 (line 175)
       ffff80ffa2000d2d * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.main([Ljava/lang/String;)V+-14032 (line 175)
       ffff80f1bac5e357 __1cJJavaCallsLcall_helper6FpnJJavaValue_rknMmethodHandle_pnRJavaCallArguments_pnGThread__v_ () + 507
       ffff80f1bad1bbd7 __1cRjni_invoke_static6FpnHJNIEnv__pnJJavaValue_pnI_jobject_nLJNICallType_pnK_jmethodID_pnSJNI_ArgumentPusher_pnGThread__v_ () + 4b7
       ffff80f1bad43ec7 jni_CallStaticVoidMethod () + 577
       ffff80f1bc206d5e JavaMain () + 30e
       ffff80ffbf515221 _thrp_setup () + a5
       ffff80ffbf5154c0 _lwp_start ()

      The key attributes of the above pstack output:

      - The debugger is in a java/net/ServerSocket.accept() call.
      - The accept() call has called NET_Timeout() which results
        in a poll() and then a pollsys() call.
      - Basically, the ServerSocket is waiting for a connect event
        to come in on the socket.


      The debuggee pstack trace looks like this:

      ----------------- lwp# 2 / thread# 2 --------------------
       ffff80ffbf51dbfa recv (6, ffff80ffbf13e250, e, 0)
       ffff80ffbf67fe2e recv () + 12
       ffff80f1ac403e5e dbgsysRecv () + 2e
       ffff80f1ac403702 recv_fully () + 32
       ffff80f1ac402a08 handshake () + 68
       ffff80f1ac40336e socketTransport_attach () + ce
       ffff80f1ac63ac7a transport_startTransport () + 7a
       ffff80f1ac622e5a startTransport () + 6a
       ffff80f1ac61fbaf bagEnumerateOver () + 3f
       ffff80f1ac6233bd initialize () + 1dd
       ffff80f1ac622629 cbEarlyVMInit () + 79
       ffff80f1bb21a011 __1cLJvmtiExportTpost_vm_initialized6F_v_ () + 581
       ffff80f1bb7cf184 __1cHThreadsJcreate_vm6FpnOJavaVMInitArgs_pb_i_ () + 7d4
       ffff80f1bad6b7eb __1cWJNI_CreateJavaVM_inner6FppnHJavaVM__ppv3_i_ () + bb
       ffff80f1bad6bcb9 JNI_CreateJavaVM () + 9
       ffff80f1bc20967b InitializeJVM () + 11b
       ffff80f1bc206aa5 JavaMain () + 55
       ffff80ffbf515221 _thrp_setup () + a5
       ffff80ffbf5154c0 _lwp_start ()

      The key attributes of the above pstack output:

      - The debuggee agent is in cbEarlyVMInit() which
        is the event handler for the VM_INIT event.
      - The agent is in socketTransport_attach() and is in
         the JDWP handshake() code.
      - The handshake() code is trying to recv() data from
         the socket.

      DO NOT add any entries to this bug report that do meet the
      exact failure mode described by this bug.

      So we have the debugger side waiting for a connect() and
      debuggee side has already returned from its connect()
      and is trying to receive data from the socket. The question
      is what happened to the connect event? Did it get dropped?
      Did it get snarfed by another ServerSocket listening on the
      same port?

      The remaining notes that I'm adding to this bug are from my
      personal e-mail archive for JDK-6303969. When the bug was
      imported from the older bug system to JBS, date and comment
      author information was stripped so original 20 description notes
      were all munged together into the mess that is the description
      note for JDK-6303969.

      Update: Adding a DKFL rule entry to match the above pstack
      output. It's a ridiculously broad rule, but it's what we have:

      RULE nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001 Timeout none

      If you get a timeout that matches this rule, you have to look at
      the pstack output for BOTH the debugger and debuggee and
      make sure they look very similar to the above examples.

            gthornbr Gerald Thornbrugh (Inactive)
            dcubed Daniel Daugherty
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: