-
Bug
-
Resolution: Fixed
-
P2
-
6, 7, 8, 9, 10
-
b19
-
sparc, x86_64
-
solaris
Setting priority to P2 to match the original bug:
JDK-6303969 JDWP: Socket Transport handshake fails rarely on InstancesTest.java
The purpose of this new bug is to extract sighting information for ONE
failure mode described inJDK-6303969.
The failure mode is a hang between the debugger and debuggee
on Solaris SPARC or Solaris X64 systems only. This failure mode
has not been seen on any other platform.
The debugger pstack trace looks like this:
----------------- lwp# 2 / thread# 2 --------------------
ffff80ffbf51e35a pollsys (ffff80ffbf13e418, 1, 0, 0)
ffff80ffbf4bef93 poll () + 5f
ffff80f1b8814ad8 NET_Timeout0 () + b8
ffff80f1b881428a NET_Timeout () + 2a
ffff80f1b8810fa7 Java_java_net_PlainSocketImpl_socketAccept () + 247
ffff80ffa2030f31 * java/net/PlainSocketImpl.socketAccept(Ljava/net/SocketImpl;)V+-30776
ffff80ffa200b7e3 * java/net/AbstractPlainSocketImpl.accept(Ljava/net/SocketImpl;)V+23568 (line 922)
ffff80ffa200b7e3 * java/net/ServerSocket.implAccept(Ljava/net/Socket;)V+3232 (line 1155)
ffff80ffa200b7e3 * java/net/ServerSocket.accept()Ljava/net/Socket;+3584 (line 1034)
ffff80ffa200b560 * nsk/share/jdwp/SocketTransport.accept()V+7768 (line 209)
ffff80ffa200b7e3 * nsk/share/jdwp/Debugee.connect()Lnsk/share/jdwp/Transport;+-7592 (line 301)
ffff80ffa200b560 * nsk/share/jdwp/Binder.bindToDebugee(Ljava/lang/String;)Lnsk/share/jdwp/Debugee;+9296 (line 185)
ffff80ffa200b560 * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.runIt([Ljava/lang/String;Ljava/io/PrintStream;)I+-15856 (line 354)
ffff80ffa200b220 * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.run([Ljava/lang/String;Ljava/io/PrintStream;)I+-15576 (line 188)
ffff80ffa200b220 * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.main([Ljava/lang/String;)V+-15304 (line 175)
ffff80ffa2000d2d * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.main([Ljava/lang/String;)V+-14032 (line 175)
ffff80f1bac5e357 __1cJJavaCallsLcall_helper6FpnJJavaValue_rknMmethodHandle_pnRJavaCallArguments_pnGThread__v_ () + 507
ffff80f1bad1bbd7 __1cRjni_invoke_static6FpnHJNIEnv__pnJJavaValue_pnI_jobject_nLJNICallType_pnK_jmethodID_pnSJNI_ArgumentPusher_pnGThread__v_ () + 4b7
ffff80f1bad43ec7 jni_CallStaticVoidMethod () + 577
ffff80f1bc206d5e JavaMain () + 30e
ffff80ffbf515221 _thrp_setup () + a5
ffff80ffbf5154c0 _lwp_start ()
The key attributes of the above pstack output:
- The debugger is in a java/net/ServerSocket.accept() call.
- The accept() call has called NET_Timeout() which results
in a poll() and then a pollsys() call.
- Basically, the ServerSocket is waiting for a connect event
to come in on the socket.
The debuggee pstack trace looks like this:
----------------- lwp# 2 / thread# 2 --------------------
ffff80ffbf51dbfa recv (6, ffff80ffbf13e250, e, 0)
ffff80ffbf67fe2e recv () + 12
ffff80f1ac403e5e dbgsysRecv () + 2e
ffff80f1ac403702 recv_fully () + 32
ffff80f1ac402a08 handshake () + 68
ffff80f1ac40336e socketTransport_attach () + ce
ffff80f1ac63ac7a transport_startTransport () + 7a
ffff80f1ac622e5a startTransport () + 6a
ffff80f1ac61fbaf bagEnumerateOver () + 3f
ffff80f1ac6233bd initialize () + 1dd
ffff80f1ac622629 cbEarlyVMInit () + 79
ffff80f1bb21a011 __1cLJvmtiExportTpost_vm_initialized6F_v_ () + 581
ffff80f1bb7cf184 __1cHThreadsJcreate_vm6FpnOJavaVMInitArgs_pb_i_ () + 7d4
ffff80f1bad6b7eb __1cWJNI_CreateJavaVM_inner6FppnHJavaVM__ppv3_i_ () + bb
ffff80f1bad6bcb9 JNI_CreateJavaVM () + 9
ffff80f1bc20967b InitializeJVM () + 11b
ffff80f1bc206aa5 JavaMain () + 55
ffff80ffbf515221 _thrp_setup () + a5
ffff80ffbf5154c0 _lwp_start ()
The key attributes of the above pstack output:
- The debuggee agent is in cbEarlyVMInit() which
is the event handler for the VM_INIT event.
- The agent is in socketTransport_attach() and is in
the JDWP handshake() code.
- The handshake() code is trying to recv() data from
the socket.
DO NOT add any entries to this bug report that do meet the
exact failure mode described by this bug.
So we have the debugger side waiting for a connect() and
debuggee side has already returned from its connect()
and is trying to receive data from the socket. The question
is what happened to the connect event? Did it get dropped?
Did it get snarfed by another ServerSocket listening on the
same port?
The remaining notes that I'm adding to this bug are from my
personal e-mail archive forJDK-6303969. When the bug was
imported from the older bug system to JBS, date and comment
author information was stripped so original 20 description notes
were all munged together into the mess that is the description
note forJDK-6303969.
Update: Adding a DKFL rule entry to match the above pstack
output. It's a ridiculously broad rule, but it's what we have:
RULE nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001 Timeout none
If you get a timeout that matches this rule, you have to look at
the pstack output for BOTH the debugger and debuggee and
make sure they look very similar to the above examples.
The purpose of this new bug is to extract sighting information for ONE
failure mode described in
The failure mode is a hang between the debugger and debuggee
on Solaris SPARC or Solaris X64 systems only. This failure mode
has not been seen on any other platform.
The debugger pstack trace looks like this:
----------------- lwp# 2 / thread# 2 --------------------
ffff80ffbf51e35a pollsys (ffff80ffbf13e418, 1, 0, 0)
ffff80ffbf4bef93 poll () + 5f
ffff80f1b8814ad8 NET_Timeout0 () + b8
ffff80f1b881428a NET_Timeout () + 2a
ffff80f1b8810fa7 Java_java_net_PlainSocketImpl_socketAccept () + 247
ffff80ffa2030f31 * java/net/PlainSocketImpl.socketAccept(Ljava/net/SocketImpl;)V+-30776
ffff80ffa200b7e3 * java/net/AbstractPlainSocketImpl.accept(Ljava/net/SocketImpl;)V+23568 (line 922)
ffff80ffa200b7e3 * java/net/ServerSocket.implAccept(Ljava/net/Socket;)V+3232 (line 1155)
ffff80ffa200b7e3 * java/net/ServerSocket.accept()Ljava/net/Socket;+3584 (line 1034)
ffff80ffa200b560 * nsk/share/jdwp/SocketTransport.accept()V+7768 (line 209)
ffff80ffa200b7e3 * nsk/share/jdwp/Debugee.connect()Lnsk/share/jdwp/Transport;+-7592 (line 301)
ffff80ffa200b560 * nsk/share/jdwp/Binder.bindToDebugee(Ljava/lang/String;)Lnsk/share/jdwp/Debugee;+9296 (line 185)
ffff80ffa200b560 * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.runIt([Ljava/lang/String;Ljava/io/PrintStream;)I+-15856 (line 354)
ffff80ffa200b220 * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.run([Ljava/lang/String;Ljava/io/PrintStream;)I+-15576 (line 188)
ffff80ffa200b220 * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.main([Ljava/lang/String;)V+-15304 (line 175)
ffff80ffa2000d2d * nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001.main([Ljava/lang/String;)V+-14032 (line 175)
ffff80f1bac5e357 __1cJJavaCallsLcall_helper6FpnJJavaValue_rknMmethodHandle_pnRJavaCallArguments_pnGThread__v_ () + 507
ffff80f1bad1bbd7 __1cRjni_invoke_static6FpnHJNIEnv__pnJJavaValue_pnI_jobject_nLJNICallType_pnK_jmethodID_pnSJNI_ArgumentPusher_pnGThread__v_ () + 4b7
ffff80f1bad43ec7 jni_CallStaticVoidMethod () + 577
ffff80f1bc206d5e JavaMain () + 30e
ffff80ffbf515221 _thrp_setup () + a5
ffff80ffbf5154c0 _lwp_start ()
The key attributes of the above pstack output:
- The debugger is in a java/net/ServerSocket.accept() call.
- The accept() call has called NET_Timeout() which results
in a poll() and then a pollsys() call.
- Basically, the ServerSocket is waiting for a connect event
to come in on the socket.
The debuggee pstack trace looks like this:
----------------- lwp# 2 / thread# 2 --------------------
ffff80ffbf51dbfa recv (6, ffff80ffbf13e250, e, 0)
ffff80ffbf67fe2e recv () + 12
ffff80f1ac403e5e dbgsysRecv () + 2e
ffff80f1ac403702 recv_fully () + 32
ffff80f1ac402a08 handshake () + 68
ffff80f1ac40336e socketTransport_attach () + ce
ffff80f1ac63ac7a transport_startTransport () + 7a
ffff80f1ac622e5a startTransport () + 6a
ffff80f1ac61fbaf bagEnumerateOver () + 3f
ffff80f1ac6233bd initialize () + 1dd
ffff80f1ac622629 cbEarlyVMInit () + 79
ffff80f1bb21a011 __1cLJvmtiExportTpost_vm_initialized6F_v_ () + 581
ffff80f1bb7cf184 __1cHThreadsJcreate_vm6FpnOJavaVMInitArgs_pb_i_ () + 7d4
ffff80f1bad6b7eb __1cWJNI_CreateJavaVM_inner6FppnHJavaVM__ppv3_i_ () + bb
ffff80f1bad6bcb9 JNI_CreateJavaVM () + 9
ffff80f1bc20967b InitializeJVM () + 11b
ffff80f1bc206aa5 JavaMain () + 55
ffff80ffbf515221 _thrp_setup () + a5
ffff80ffbf5154c0 _lwp_start ()
The key attributes of the above pstack output:
- The debuggee agent is in cbEarlyVMInit() which
is the event handler for the VM_INIT event.
- The agent is in socketTransport_attach() and is in
the JDWP handshake() code.
- The handshake() code is trying to recv() data from
the socket.
DO NOT add any entries to this bug report that do meet the
exact failure mode described by this bug.
So we have the debugger side waiting for a connect() and
debuggee side has already returned from its connect()
and is trying to receive data from the socket. The question
is what happened to the connect event? Did it get dropped?
Did it get snarfed by another ServerSocket listening on the
same port?
The remaining notes that I'm adding to this bug are from my
personal e-mail archive for
imported from the older bug system to JBS, date and comment
author information was stripped so original 20 description notes
were all munged together into the mess that is the description
note for
Update: Adding a DKFL rule entry to match the above pstack
output. It's a ridiculously broad rule, but it's what we have:
RULE nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001 Timeout none
If you get a timeout that matches this rule, you have to look at
the pstack output for BOTH the debugger and debuggee and
make sure they look very similar to the above examples.
- relates to
-
JDK-6303969 JDWP: Socket Transport handshake fails rarely on InstancesTest.java
-
- Closed
-