Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2183600 | 7 | Daniel Daugherty | P4 | Closed | Fixed | b73 |
JDK-2189947 | 6u21 | Daniel Daugherty | P4 | Closed | Fixed | b01 |
During my testing of my fix for 6419370, the following VM/NSK
test hung:
nsk/jvmti/scenarios/sampling/SP07/sp07t002
on my Solaris X86 machine running the following config:
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b70)
Java HotSpot(TM) Client VM (build 17.0-b01-internal, mixed mode)
I created a doit.sh script from the .tlog file and used my
doit_loop script to reproduce the hang once:
Results: 318 runs (1 FAILed, 317 PASSed)
Failure details: hangs: 1
Run duration: 3 hour(s) 51 second(s)
I've attached the thread dump in threads.log.295 and a
30 second truss snapshot in truss.log.295.
Only two threads were active in the 30 second snapshot:
% sed 's/[ ].*//' truss.log.295 | sort | uniq -c
1 /1:
1 /10:
1 /11:
43 /2:
1 /3:
1 /4:
1 /5:
1 /6:
1 /7:
1 /8:
499 /9:
Thread t@2 has the following stack snippet:
t@2 (l@2) stopped in ___nanosleep at 0xfeec8095
0xfeec8095: ___nanosleep+0x0015: jae ___nanosleep+0x23 [ 0xfeec80a3, .+0xe ]
current thread: t@2
[1] ___nanosleep(0x1), at 0xfeec8095
[2] THREAD_sleep(0x1), at 0xfe413424
[3] nsk_jvmti_sleep(0x3e8, 0x0), at 0xfe4166cd
[4] syncDebugeeStatus(0x8066510, 0x8064b98, 0x0), at 0xfe4167fb
[5] Java_nsk_share_jvmti_DebugeeClass_checkStatus(0x8066510, 0xfe55ebf0, 0x0,
0xfe55ebc0, 0x1, 0xf81c10c0, 0xfe55ebc0, 0xf81c0e10, 0xfe55ebf8, 0xf81c17a0, 0x0, 0xf81c0e10, 0x0, 0xfe55ebf8, 0xfe55ec20, 0xfc202ea9, 0xf81c10c0, 0xfc207fd9, 0x0, 0xf44f2f38), at 0xfe4169b3
[6] 0xfc20a23e(0xf81c10c0, 0xfc207fd9, 0x0, 0xf44f2f38, 0xfe55ec00, 0xf81c0a94), at 0xfc20a23e
Thread t@9 has the following stack snippet:
t@9 (l@9) stopped in __pollsys at 0xfeec8da5
0xfeec8da5: __pollsys+0x0015: jb __cerror [ 0xfee405a0, .-0x88805
]
current thread: t@9
=>[1] __pollsys(0x0, 0x0, 0xf3e5ad60, 0x0), at 0xfeec8da5
[2] _poll(0x0, 0x0, 0x32, 0xfe9ce7d6), at 0xfee6f502
[3] os_sleep(0x32, 0x0, 0x0, 0xfe9ceaa2), at 0xfe9ce922
[4] os::sleep(0x8116400, 0x32, 0x0, 0x0), at 0xfe9ceb40
[5] WatcherThread::run(0x8116400, 0xfebfa000, 0xf3e5afd8, 0xfe9cb38e), at 0xfe6d820e
It is interesting to see that the VMThread is blocked:
t@3 (l@3) stopped in ___lwp_cond_wait at 0xfeec9455
0xfeec9455: ___lwp_cond_wait+0x0015: jae ___lwp_cond_wait+0x23 [ 0xfeec9463, .+0xe ]
current thread: t@3
=>[1] ___lwp_cond_wait(0x80f5448, 0x80f5430, 0x0, 0x1), at 0xfeec9455
[2] os::PlatformEvent::park(0x80f5400, 0x0, 0x0, 0xfe9ba482), at 0xfe9d0a22
[3] ParkCommon(0x80f5400, 0x0, 0x0, 0x1), at 0xfe9ba4cd
[4] Monitor::ILock(0x81495a8, 0x80f4800, 0xfec13758, 0xfe9bad24), at 0xfe9ba657
[5] Monitor::lock_without_safepoint_check(0x81495a8, 0x80f4800, 0x1, 0xfe9bad58), at 0xfe9bad39
[6] Monitor::lock_without_safepoint_check(0x81495a8, 0x1, 0x3, 0xfea0186a), at 0xfe9bada3
[7] ThreadSafepointState::examine_state_of_thread(0x8111fb0, 0xfebfa000, 0xf3ffeca8, 0xfe702eab), at 0xfea01895
[8] SafepointSynchronize::begin(0x8067648, 0x8067620), at 0xfe702f2a
[9] VMThread::loop(0x80f4800, 0x7f, 0x0, 0xfe6b471a), at 0xfe6b4b38
[10] VMThread::run(0x80f4800, 0xfebfa000, 0xf3ffefd8, 0xfe9cb38e), at 0xfe6b47a1
Not sure if it means anything, but the first address passed
to the Monitor::ILock() call above appears in the stack traces
for the following threads:
t@10 (l@10) stopped in ___lwp_cond_wait at 0xfeec9455
0xfeec9455: ___lwp_cond_wait+0x0015: jae ___lwp_cond_wait+0x23 [ 0xfeec9463, .+0xe ]
current thread: t@10
=>[1] ___lwp_cond_wait(0x815b448, 0x815b430, 0x0, 0x1), at 0xfeec9455
[2] os::PlatformEvent::park(0x815b400, 0x81495a8, 0x0, 0xfe9ba482), at 0xfe9d0a22
[3] ParkCommon(0x815b400, 0x0, 0x0, 0xfe9ba98a), at 0xfe9ba4cd
[4] Monitor::IWait(0x81495a8, 0x815a800, 0x0, 0x0), at 0xfe9baa21
[5] Monitor::wait(0x81495a8, 0x1, 0x0, 0x0), at 0xfe9bb2ce
[6] JavaThread::java_suspend_self(0x815a800, 0x0, 0xfebfa000, 0xfe715aa5), at
0xfeaaeebc
[7] JavaThread::check_safepoint_and_suspend_for_native_trans(0x815a800, 0x815a910, 0xf3dd7538, 0xfeaaef3c), at 0xfe715b25
[8] JavaThread::check_special_condition_for_native_trans(0x815a800), at 0xfeaaef4e
[9] 0xfc27b4ff(0xfe6dbf47, 0xf3dd74f8, 0xfebfa000, 0xf3dd759c, 0xf4093398, 0xfc27a588), at 0xfc27b4ff
t@11 (l@11) stopped in ___lwp_cond_wait at 0xfeec9455
0xfeec9455: ___lwp_cond_wait+0x0015: jae ___lwp_cond_wait+0x23 [ 0xfeec9463, .+0xe ]
current thread: t@11
=>[1] ___lwp_cond_wait(0x815ca48, 0x815ca30, 0x0, 0x1), at 0xfeec9455
[2] os::PlatformEvent::park(0x815ca00, 0x0, 0x0, 0xfe9ba482), at 0xfe9d0a22
[3] ParkCommon(0x815ca00, 0x0, 0x0, 0x1), at 0xfe9ba4cd
[4] Monitor::ILock(0x80658c0, 0x815bc00, 0x1, 0xfe9bad24), at 0xfe9ba657
[5] Monitor::lock_without_safepoint_check(0x80658c0, 0x815bc00, 0x0, 0xfe9bad58), at 0xfe9bad39
[6] Monitor::lock_without_safepoint_check(0x80658c0, 0xfebfa000, 0xf3d88984, 0xfe70324a), at 0xfe9bada3
[7] SafepointSynchronize::block(0x815bc00, 0x815bc00, 0x5, 0x0), at 0xfe7033a7
[8] Monitor::wait(0x81495a8, 0x0, 0x5, 0x0), at 0xfe9bb2a4
[9] JavaThread::wait_for_ext_suspend_completion(0x815a800, 0x32, 0x5, 0xf3d88b08), at 0xfeaac3dd
[10] JvmtiEnvBase::is_thread_fully_suspended(0x815a800, 0x1, 0xf3d88b08, 0xfe9365e8), at 0xfe93dc44
[11] JvmtiEnv::GetStackTrace(0x8064b98, 0x815a800, 0x0, 0x400, 0xfe438500), at 0xfe936607
[12] jvmti_GetStackTrace(0x8064b98, 0x815ce24, 0x0, 0x400, 0xfe438500, 0xfe435b38), at 0xfe9114ea
[13] checkStackTrace(0x8064b98, 0x815bd10), at 0xfe41a932
[14] agentProc(0x8064b98, 0x815bd10, 0x0), at 0xfe41acf9
[15] agentThreadWrapper(0x8064b98, 0x815bd10, 0x0, 0xfe9532b2), at 0xfe4162a3
[16] JvmtiAgentThread::call_start_function(0x815bc00, 0xfee646de, 0xfef3d5d0,
0xfe953288), at 0xfe953374
[17] JvmtiAgentThread::start_function_wrapper(0x815bc00), at 0xfe95329a
[18] JavaThread::thread_main_inner(0x815bc00, 0x0, 0x0, 0xfe6c2dbe), at 0xfe6c2fa9
It looks like the VMThread (t@3) is trying to begin a
safepoint and is blocked trying to examine the state
of another thread in a Monitor::ILock(0x81495a8) call.
Thread t@10 is trying to suspend itself and is calling
Monitor::IWait(0x81495a8).
thread t@11 is waiting for a suspend to complete, calls
Monitor::wait(0x81495a8) and blocks on the pending safepoint.
Uh oh....
I suspect that t@11 is still holding the lock it is trying
to wait() on. Look at the t@10 waiting sequence:
=>[1] ___lwp_cond_wait(0x815b448, 0x815b430, 0x0, 0x1), at 0xfeec9455
[2] os::PlatformEvent::park(0x815b400, 0x81495a8, 0x0, 0xfe9ba482), at 0xfe9d0a22
[3] ParkCommon(0x815b400, 0x0, 0x0, 0xfe9ba98a), at 0xfe9ba4cd
[4] Monitor::IWait(0x81495a8, 0x815a800, 0x0, 0x0), at 0xfe9baa21
[5] Monitor::wait(0x81495a8, 0x1, 0x0, 0x0), at 0xfe9bb2ce
[6] JavaThread::java_suspend_self(0x815a800, 0x0, 0xfebfa000, 0xfe715aa5), at
0xfeaaeebc
In t@10 we've gotten all the way down to lwp_cond_wait().
Look again at the t@11 waiting sequence:
=>[1] ___lwp_cond_wait(0x815ca48, 0x815ca30, 0x0, 0x1), at 0xfeec9455
[2] os::PlatformEvent::park(0x815ca00, 0x0, 0x0, 0xfe9ba482), at 0xfe9d0a22
[3] ParkCommon(0x815ca00, 0x0, 0x0, 0x1), at 0xfe9ba4cd
[4] Monitor::ILock(0x80658c0, 0x815bc00, 0x1, 0xfe9bad24), at 0xfe9ba657
[5] Monitor::lock_without_safepoint_check(0x80658c0, 0x815bc00, 0x0, 0xfe9bad58), at 0xfe9bad39
[6] Monitor::lock_without_safepoint_check(0x80658c0, 0xfebfa000, 0xf3d88984, 0xfe70324a), at 0xfe9bada3
[7] SafepointSynchronize::block(0x815bc00, 0x815bc00, 0x5, 0x0), at 0xfe7033a7
[8] Monitor::wait(0x81495a8, 0x0, 0x5, 0x0), at 0xfe9bb2a4
[9] JavaThread::wait_for_ext_suspend_completion(0x815a800, 0x32, 0x5, 0xf3d88b08), at 0xfeaac3dd
Our Monitor::wait(0x81495a8) didn't call Monitor:IWait(...)
and get all the way down to lwp_cond_wait(). However, it
did call SafepointSynchronize::block()...
I'll have to check the code to see if these traces make sense...
I did an overnight doit_loop run with baseline JDK7-B70:
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b70)
Java HotSpot(TM) Client VM (build 16.0-b07, mixed mode)
Java HotSpot(TM) Client VM (16.0-b07) for solaris-x86 JRE (1.7.0-ea-b70), built on Aug 20 2009 03:38:34 by "" with Workshop 5.9
Here's the summary:
Results: 1083 runs (4 FAILed, 1079 PASSed)
Failure details: hangs: 4
Run duration: 9 hour(s) 37 minute(s) 16 second(s)
The truss logs are similar in that only threads t@2 and t@9
show activity. The thread dumps have the same function names,
but different address (as expected). I've attached all the
logs for the record.
I did a search of nightly testing results. Here are some
hangs of this test observed in RT_Baseline:
http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2009-08-19/RT_Baseline/vm/solaris-amd64/server/mixed/solaris-amd64_server_mixed_nsk.jvmti.testlist/ResultDir/sp07t002/sp07t002.log
So the config is Solaris AMD64 Server VM -Xmixed.
http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2009-08-10/RT_Baseline/vm/solaris-sparcv9/server/comp/solaris-sparcv9_server_comp_nsk.jvmti.testlist/ResultDir/sp07t002/sp07t002.log
So the config is Solaris SPARC-64 Server VM -Xcomp.
http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2009-06-16/RT_Baseline/vm/linux-amd64/server/comp/linux-amd64_server_comp_nsk.jvmti.testlist/ResultDir/sp07t002/sp07t002.log
So the config is Linux AMD64 Server VM -Xcomp.
http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2009-04-21/RT_Baseline/vm/windows-i586/client/mixed/windows-i586_client_mixed_nsk.jvmti.testlist/ResultDir/sp07t002/sp07t002.log
So the config is Windows X86 Client VM -Xmixed.
I'm changing my call record to generic/generic.
Data from service_hs nightly analysis 2008.12.24 (final)
Unexpected Timeouts
-------------------
nsk.jvmti
nsk/jvmti/scenarios/sampling/SP07/sp07t002
2008.11.25 Win32 Client VM (machine colfax005)
2008.11.05 Win32 Server VM (machine dantooine)
2008.10.29 Solaris AMD64 Server VM (machine vm-x86-1)
2008.06.18 Solaris SPARC-64 Server VM (machine jtg-s111)
2008.05.15 Solaris X86 Client VM (machine vm-v20z-30)
2008.02.04 Win32 Client VM (machine vm-v20z-19)
Data from service_hs nightly analysis 2006.11.09 (partial)
URL: http://gtee.sfbay/gtee/results/MUSTANG/NIGHTLY/VM-MAIN/2006-11-09/Serv_Baseline/
VM Build: 20061109061001.dcubed.service_hs_b03_merge.1 (new VM)
Last main/baseline putback: 20061109050230.ik199011.rt_merge
JDK: Dolphin-B02 (new JDK)
Testbase: v160r08 (same testbase as previous version)
Known nsk.jvmti UFOYOYOs:
nsk/jvmti/scenarios/sampling/SP07/sp07t002
Last failure on 2006.11.06 with Solaris SPARC-64 Server VM.
Previous failure on 2006.10.31 with Solaris SPARC Server VM.
The fix for 6470570 was putback to rt_baseline on 2006.09.26.
rt_baseline was pushed to main/baseline on 2006.09.27.
main/baseline was pushed to serv_hs_baseline on 2006.09.27.
In the 2006.09.27 nightly report, the sp07t002 entry looked like
nsk/jvmti/scenarios/sampling/SP07/sp07t002
Last failure on 2005.10.04 with Linux IA32 Client VM.
Previous failure on 2005.10.01 with Linux IA32 Client VM.
A new hang was first noticed on 2006.10.31:
* nsk/jvmti/scenarios/sampling/SP07/sp07t002
Last failure on 2006.10.31 with Solaris SPARC Server VM.
Previous failure on 2005.10.04 with Linux IA32 Client VM.
I don't know whether I just missed any earlier hangs or not.
test hung:
nsk/jvmti/scenarios/sampling/SP07/sp07t002
on my Solaris X86 machine running the following config:
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b70)
Java HotSpot(TM) Client VM (build 17.0-b01-internal, mixed mode)
I created a doit.sh script from the .tlog file and used my
doit_loop script to reproduce the hang once:
Results: 318 runs (1 FAILed, 317 PASSed)
Failure details: hangs: 1
Run duration: 3 hour(s) 51 second(s)
I've attached the thread dump in threads.log.295 and a
30 second truss snapshot in truss.log.295.
Only two threads were active in the 30 second snapshot:
% sed 's/[ ].*//' truss.log.295 | sort | uniq -c
1 /1:
1 /10:
1 /11:
43 /2:
1 /3:
1 /4:
1 /5:
1 /6:
1 /7:
1 /8:
499 /9:
Thread t@2 has the following stack snippet:
t@2 (l@2) stopped in ___nanosleep at 0xfeec8095
0xfeec8095: ___nanosleep+0x0015: jae ___nanosleep+0x23 [ 0xfeec80a3, .+0xe ]
current thread: t@2
[1] ___nanosleep(0x1), at 0xfeec8095
[2] THREAD_sleep(0x1), at 0xfe413424
[3] nsk_jvmti_sleep(0x3e8, 0x0), at 0xfe4166cd
[4] syncDebugeeStatus(0x8066510, 0x8064b98, 0x0), at 0xfe4167fb
[5] Java_nsk_share_jvmti_DebugeeClass_checkStatus(0x8066510, 0xfe55ebf0, 0x0,
0xfe55ebc0, 0x1, 0xf81c10c0, 0xfe55ebc0, 0xf81c0e10, 0xfe55ebf8, 0xf81c17a0, 0x0, 0xf81c0e10, 0x0, 0xfe55ebf8, 0xfe55ec20, 0xfc202ea9, 0xf81c10c0, 0xfc207fd9, 0x0, 0xf44f2f38), at 0xfe4169b3
[6] 0xfc20a23e(0xf81c10c0, 0xfc207fd9, 0x0, 0xf44f2f38, 0xfe55ec00, 0xf81c0a94), at 0xfc20a23e
Thread t@9 has the following stack snippet:
t@9 (l@9) stopped in __pollsys at 0xfeec8da5
0xfeec8da5: __pollsys+0x0015: jb __cerror [ 0xfee405a0, .-0x88805
]
current thread: t@9
=>[1] __pollsys(0x0, 0x0, 0xf3e5ad60, 0x0), at 0xfeec8da5
[2] _poll(0x0, 0x0, 0x32, 0xfe9ce7d6), at 0xfee6f502
[3] os_sleep(0x32, 0x0, 0x0, 0xfe9ceaa2), at 0xfe9ce922
[4] os::sleep(0x8116400, 0x32, 0x0, 0x0), at 0xfe9ceb40
[5] WatcherThread::run(0x8116400, 0xfebfa000, 0xf3e5afd8, 0xfe9cb38e), at 0xfe6d820e
It is interesting to see that the VMThread is blocked:
t@3 (l@3) stopped in ___lwp_cond_wait at 0xfeec9455
0xfeec9455: ___lwp_cond_wait+0x0015: jae ___lwp_cond_wait+0x23 [ 0xfeec9463, .+0xe ]
current thread: t@3
=>[1] ___lwp_cond_wait(0x80f5448, 0x80f5430, 0x0, 0x1), at 0xfeec9455
[2] os::PlatformEvent::park(0x80f5400, 0x0, 0x0, 0xfe9ba482), at 0xfe9d0a22
[3] ParkCommon(0x80f5400, 0x0, 0x0, 0x1), at 0xfe9ba4cd
[4] Monitor::ILock(0x81495a8, 0x80f4800, 0xfec13758, 0xfe9bad24), at 0xfe9ba657
[5] Monitor::lock_without_safepoint_check(0x81495a8, 0x80f4800, 0x1, 0xfe9bad58), at 0xfe9bad39
[6] Monitor::lock_without_safepoint_check(0x81495a8, 0x1, 0x3, 0xfea0186a), at 0xfe9bada3
[7] ThreadSafepointState::examine_state_of_thread(0x8111fb0, 0xfebfa000, 0xf3ffeca8, 0xfe702eab), at 0xfea01895
[8] SafepointSynchronize::begin(0x8067648, 0x8067620), at 0xfe702f2a
[9] VMThread::loop(0x80f4800, 0x7f, 0x0, 0xfe6b471a), at 0xfe6b4b38
[10] VMThread::run(0x80f4800, 0xfebfa000, 0xf3ffefd8, 0xfe9cb38e), at 0xfe6b47a1
Not sure if it means anything, but the first address passed
to the Monitor::ILock() call above appears in the stack traces
for the following threads:
t@10 (l@10) stopped in ___lwp_cond_wait at 0xfeec9455
0xfeec9455: ___lwp_cond_wait+0x0015: jae ___lwp_cond_wait+0x23 [ 0xfeec9463, .+0xe ]
current thread: t@10
=>[1] ___lwp_cond_wait(0x815b448, 0x815b430, 0x0, 0x1), at 0xfeec9455
[2] os::PlatformEvent::park(0x815b400, 0x81495a8, 0x0, 0xfe9ba482), at 0xfe9d0a22
[3] ParkCommon(0x815b400, 0x0, 0x0, 0xfe9ba98a), at 0xfe9ba4cd
[4] Monitor::IWait(0x81495a8, 0x815a800, 0x0, 0x0), at 0xfe9baa21
[5] Monitor::wait(0x81495a8, 0x1, 0x0, 0x0), at 0xfe9bb2ce
[6] JavaThread::java_suspend_self(0x815a800, 0x0, 0xfebfa000, 0xfe715aa5), at
0xfeaaeebc
[7] JavaThread::check_safepoint_and_suspend_for_native_trans(0x815a800, 0x815a910, 0xf3dd7538, 0xfeaaef3c), at 0xfe715b25
[8] JavaThread::check_special_condition_for_native_trans(0x815a800), at 0xfeaaef4e
[9] 0xfc27b4ff(0xfe6dbf47, 0xf3dd74f8, 0xfebfa000, 0xf3dd759c, 0xf4093398, 0xfc27a588), at 0xfc27b4ff
t@11 (l@11) stopped in ___lwp_cond_wait at 0xfeec9455
0xfeec9455: ___lwp_cond_wait+0x0015: jae ___lwp_cond_wait+0x23 [ 0xfeec9463, .+0xe ]
current thread: t@11
=>[1] ___lwp_cond_wait(0x815ca48, 0x815ca30, 0x0, 0x1), at 0xfeec9455
[2] os::PlatformEvent::park(0x815ca00, 0x0, 0x0, 0xfe9ba482), at 0xfe9d0a22
[3] ParkCommon(0x815ca00, 0x0, 0x0, 0x1), at 0xfe9ba4cd
[4] Monitor::ILock(0x80658c0, 0x815bc00, 0x1, 0xfe9bad24), at 0xfe9ba657
[5] Monitor::lock_without_safepoint_check(0x80658c0, 0x815bc00, 0x0, 0xfe9bad58), at 0xfe9bad39
[6] Monitor::lock_without_safepoint_check(0x80658c0, 0xfebfa000, 0xf3d88984, 0xfe70324a), at 0xfe9bada3
[7] SafepointSynchronize::block(0x815bc00, 0x815bc00, 0x5, 0x0), at 0xfe7033a7
[8] Monitor::wait(0x81495a8, 0x0, 0x5, 0x0), at 0xfe9bb2a4
[9] JavaThread::wait_for_ext_suspend_completion(0x815a800, 0x32, 0x5, 0xf3d88b08), at 0xfeaac3dd
[10] JvmtiEnvBase::is_thread_fully_suspended(0x815a800, 0x1, 0xf3d88b08, 0xfe9365e8), at 0xfe93dc44
[11] JvmtiEnv::GetStackTrace(0x8064b98, 0x815a800, 0x0, 0x400, 0xfe438500), at 0xfe936607
[12] jvmti_GetStackTrace(0x8064b98, 0x815ce24, 0x0, 0x400, 0xfe438500, 0xfe435b38), at 0xfe9114ea
[13] checkStackTrace(0x8064b98, 0x815bd10), at 0xfe41a932
[14] agentProc(0x8064b98, 0x815bd10, 0x0), at 0xfe41acf9
[15] agentThreadWrapper(0x8064b98, 0x815bd10, 0x0, 0xfe9532b2), at 0xfe4162a3
[16] JvmtiAgentThread::call_start_function(0x815bc00, 0xfee646de, 0xfef3d5d0,
0xfe953288), at 0xfe953374
[17] JvmtiAgentThread::start_function_wrapper(0x815bc00), at 0xfe95329a
[18] JavaThread::thread_main_inner(0x815bc00, 0x0, 0x0, 0xfe6c2dbe), at 0xfe6c2fa9
It looks like the VMThread (t@3) is trying to begin a
safepoint and is blocked trying to examine the state
of another thread in a Monitor::ILock(0x81495a8) call.
Thread t@10 is trying to suspend itself and is calling
Monitor::IWait(0x81495a8).
thread t@11 is waiting for a suspend to complete, calls
Monitor::wait(0x81495a8) and blocks on the pending safepoint.
Uh oh....
I suspect that t@11 is still holding the lock it is trying
to wait() on. Look at the t@10 waiting sequence:
=>[1] ___lwp_cond_wait(0x815b448, 0x815b430, 0x0, 0x1), at 0xfeec9455
[2] os::PlatformEvent::park(0x815b400, 0x81495a8, 0x0, 0xfe9ba482), at 0xfe9d0a22
[3] ParkCommon(0x815b400, 0x0, 0x0, 0xfe9ba98a), at 0xfe9ba4cd
[4] Monitor::IWait(0x81495a8, 0x815a800, 0x0, 0x0), at 0xfe9baa21
[5] Monitor::wait(0x81495a8, 0x1, 0x0, 0x0), at 0xfe9bb2ce
[6] JavaThread::java_suspend_self(0x815a800, 0x0, 0xfebfa000, 0xfe715aa5), at
0xfeaaeebc
In t@10 we've gotten all the way down to lwp_cond_wait().
Look again at the t@11 waiting sequence:
=>[1] ___lwp_cond_wait(0x815ca48, 0x815ca30, 0x0, 0x1), at 0xfeec9455
[2] os::PlatformEvent::park(0x815ca00, 0x0, 0x0, 0xfe9ba482), at 0xfe9d0a22
[3] ParkCommon(0x815ca00, 0x0, 0x0, 0x1), at 0xfe9ba4cd
[4] Monitor::ILock(0x80658c0, 0x815bc00, 0x1, 0xfe9bad24), at 0xfe9ba657
[5] Monitor::lock_without_safepoint_check(0x80658c0, 0x815bc00, 0x0, 0xfe9bad58), at 0xfe9bad39
[6] Monitor::lock_without_safepoint_check(0x80658c0, 0xfebfa000, 0xf3d88984, 0xfe70324a), at 0xfe9bada3
[7] SafepointSynchronize::block(0x815bc00, 0x815bc00, 0x5, 0x0), at 0xfe7033a7
[8] Monitor::wait(0x81495a8, 0x0, 0x5, 0x0), at 0xfe9bb2a4
[9] JavaThread::wait_for_ext_suspend_completion(0x815a800, 0x32, 0x5, 0xf3d88b08), at 0xfeaac3dd
Our Monitor::wait(0x81495a8) didn't call Monitor:IWait(...)
and get all the way down to lwp_cond_wait(). However, it
did call SafepointSynchronize::block()...
I'll have to check the code to see if these traces make sense...
I did an overnight doit_loop run with baseline JDK7-B70:
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b70)
Java HotSpot(TM) Client VM (build 16.0-b07, mixed mode)
Java HotSpot(TM) Client VM (16.0-b07) for solaris-x86 JRE (1.7.0-ea-b70), built on Aug 20 2009 03:38:34 by "" with Workshop 5.9
Here's the summary:
Results: 1083 runs (4 FAILed, 1079 PASSed)
Failure details: hangs: 4
Run duration: 9 hour(s) 37 minute(s) 16 second(s)
The truss logs are similar in that only threads t@2 and t@9
show activity. The thread dumps have the same function names,
but different address (as expected). I've attached all the
logs for the record.
I did a search of nightly testing results. Here are some
hangs of this test observed in RT_Baseline:
http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2009-08-19/RT_Baseline/vm/solaris-amd64/server/mixed/solaris-amd64_server_mixed_nsk.jvmti.testlist/ResultDir/sp07t002/sp07t002.log
So the config is Solaris AMD64 Server VM -Xmixed.
http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2009-08-10/RT_Baseline/vm/solaris-sparcv9/server/comp/solaris-sparcv9_server_comp_nsk.jvmti.testlist/ResultDir/sp07t002/sp07t002.log
So the config is Solaris SPARC-64 Server VM -Xcomp.
http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2009-06-16/RT_Baseline/vm/linux-amd64/server/comp/linux-amd64_server_comp_nsk.jvmti.testlist/ResultDir/sp07t002/sp07t002.log
So the config is Linux AMD64 Server VM -Xcomp.
http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2009-04-21/RT_Baseline/vm/windows-i586/client/mixed/windows-i586_client_mixed_nsk.jvmti.testlist/ResultDir/sp07t002/sp07t002.log
So the config is Windows X86 Client VM -Xmixed.
I'm changing my call record to generic/generic.
Data from service_hs nightly analysis 2008.12.24 (final)
Unexpected Timeouts
-------------------
nsk.jvmti
nsk/jvmti/scenarios/sampling/SP07/sp07t002
2008.11.25 Win32 Client VM (machine colfax005)
2008.11.05 Win32 Server VM (machine dantooine)
2008.10.29 Solaris AMD64 Server VM (machine vm-x86-1)
2008.06.18 Solaris SPARC-64 Server VM (machine jtg-s111)
2008.05.15 Solaris X86 Client VM (machine vm-v20z-30)
2008.02.04 Win32 Client VM (machine vm-v20z-19)
Data from service_hs nightly analysis 2006.11.09 (partial)
URL: http://gtee.sfbay/gtee/results/MUSTANG/NIGHTLY/VM-MAIN/2006-11-09/Serv_Baseline/
VM Build: 20061109061001.dcubed.service_hs_b03_merge.1 (new VM)
Last main/baseline putback: 20061109050230.ik199011.rt_merge
JDK: Dolphin-B02 (new JDK)
Testbase: v160r08 (same testbase as previous version)
Known nsk.jvmti UFOYOYOs:
nsk/jvmti/scenarios/sampling/SP07/sp07t002
Last failure on 2006.11.06 with Solaris SPARC-64 Server VM.
Previous failure on 2006.10.31 with Solaris SPARC Server VM.
The fix for 6470570 was putback to rt_baseline on 2006.09.26.
rt_baseline was pushed to main/baseline on 2006.09.27.
main/baseline was pushed to serv_hs_baseline on 2006.09.27.
In the 2006.09.27 nightly report, the sp07t002 entry looked like
nsk/jvmti/scenarios/sampling/SP07/sp07t002
Last failure on 2005.10.04 with Linux IA32 Client VM.
Previous failure on 2005.10.01 with Linux IA32 Client VM.
A new hang was first noticed on 2006.10.31:
* nsk/jvmti/scenarios/sampling/SP07/sp07t002
Last failure on 2006.10.31 with Solaris SPARC Server VM.
Previous failure on 2005.10.04 with Linux IA32 Client VM.
I don't know whether I just missed any earlier hangs or not.
- backported by
-
JDK-2183600 sp07t002 hangs very intermittently
- Closed
-
JDK-2189947 sp07t002 hangs very intermittently
- Closed
- duplicates
-
JDK-6820575 Threads suspending leads to hang
- Closed
- relates to
-
JDK-4379031 java -Xrunhprof:cpu=samples PepTest intermittently hangs
- Closed
-
JDK-4510838 self-suspend race with GetCallTrace
- Closed
-
JDK-6470570 Modify suspend/resume code to work better with the safepoint protocol
- Closed
(1 relates to)