The bug is difficult to reproduce, but has been seen, for instance
running Volano with a very high connection count on a large MP
machine: In the specific case of this report, we were running
with 1100 connections (only) on a 10 processor SPARC (the assert
fired in the client; we were running with Compiler2, but i believe
that may just be a coincoidence):
The VM thread is in the process of progress towards a safepoint,
and asserts trying to examine the state of a (suspended) thread
that's executing compiled code.
Here's the VM thread:
current thread: t@4
[1] __sigprocmask(0x0, 0xfee0abf0, 0x0, 0xffffffff, 0xffffffff, 0x0), at 0xff379da0
[2] _resetsig(0xff38ca04, 0x0, 0xfee0bdc8, 0x0, 0x0, 0xfee0be44), at 0xff36f3f8
[3] _sigon(0xff394358, 0xff3942d8, 0xfee0be3c, 0xfee0acc4, 0x6, 0xff2cd00c), at 0xff36eb44
[4] _thrp_kill(0x0, 0x4, 0x6, 0xff38ca04, 0xfee0bdc8, 0xff393c5c), at 0xff371a20
[5] abort(0xff333968, 0xfee0adc0, 0x0, 0xfffffff8, 0x0, 0xfee0ade1), at 0xff2b9468
[6] os::abort(), line 1150 in "os_solaris.cpp"
[7] report_error(is_vm_internal_error = 1, file_name = 0xfe5190c4 "/export/sail1/jdk1.3_b09/hotsparc-1.3-b09/src/share/vm/code/codeCache.cpp", line_no = 193, title = 0xfe421c6c "assertion failure", format = 0xfe421c80 "assert(%s, "%s")", ...), line 322 in "debug.cpp"
[8] report_assertion_failure(code_str = 0xfe51909c "result->blob_contains((address)start)", file_name = 0xfe5190c4 "/export/sail1/jdk1.3_b09/hotsparc-1.3-b09/src/share/vm/code/codeCache.cpp", line_no = 193, message = 0xfe519110 "found wrong CodeBlob"), line 145 in "debug.cpp"
[9] CodeCache::find_blob(start = 0xfa003300), line 193 in "codeCache.cpp"
=>[10] ThreadSafepointState::examine_state_of_thread(this = 0x94c1f8, iterations = 0), line 502 in "safepoint.cpp"
[11] SafepointSynchronize::begin(), line 75 in "safepoint.cpp"
[12] VMThread::loop(this = 0x155ca0), line 269 in "vmThread.cpp"
[13] VMThread::vm_thread_main(notify_lock = 0x28f78), line 162 in "vmThread.cpp"
[14] _start(osthread = 0x155f38), line 514 in "os_solaris.cpp"
And here's the target thread (the one whose thread state is being "examined"):
current thread: t@2189
[1] __lwp_sema_wait(0xc2cd1e78, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff317aec
[2] _park(0xc2cd1dc8, 0xc2cd1e78, 0x0, 0xf, 0xff38d798, 0xc4311dc8), at 0xff36b1e4
[3] _swtch(0x5, 0xff38ca04, 0xc2cd1e58, 0xc2cd1e54, 0xc2cd1e50, 0xc2cd1e4c), at 0xff36aed8
[4] _dopreempt(0x0, 0xff38ca04, 0x1, 0x0, 0x0, 0x0), at 0xff36ce10
[5] _sigon(0x1, 0xc2cd1e2c, 0xc2cd0f88, 0x0, 0x0, 0x0), at 0xff36eaf0
[6] thr_sigsetmask(0xc2cd1e54, 0xc2cd1030, 0x8000, 0xff38ca04, 0xc2cd1e4c, 0xc2cd1e2c), at 0xff371244
[7] __libthread_segvhdlr(0xb, 0xc2cd14a0, 0xc2cd11e8, 0xff38ca04, 0x0, 0x0), at 0xff379344
[8] __sighndlr(0xb, 0xc2cd14a0, 0xc2cd11e8, 0xff379278, 0xc2cd1e4c, 0xc2cd1e2c), at 0xff37be00
[9] sigacthandler(0xb, 0xc2cd14a0, 0xc2cd11e8, 0xff38ca04, 0x2c, 0xc2cd1e5c), at 0xff378708
---- called from signal handler with signal 11 (SIGSEGV) ------
[10] 0xfa0032fc(0xda0310e0, 0xe, 0x94b750, 0xf7ce5830, 0xc, 0x8), at 0xfa0032fb
[11] 0xfa0156dc(0xda030f48, 0xf7cd10b8, 0xd8532120, 0xda030fe0, 0x0, 0x0), at 0xfa0156db
[12] 0xfa016794(0xf7c38d00, 0x1, 0x0, 0x0, 0x0, 0x0), at 0xfa016793
[13] 0xfa014a3c(0xd7c6cbe0, 0xf7c16d58, 0xc2cd17dc, 0x4, 0xbfc00, 0x0), at 0xfa014a3b
[14] 0xc93c8(0xc2cd17dc, 0x94b750, 0x8, 0xe3da8, 0x4, 0xc2cd1708), at 0xc93c7
[15] 0xfe5d8974(0xc2cd1874, 0xc2cd1bc0, 0xa, 0xf7c176d8, 0x4, 0xc2cd1780), at 0xfe5d8973
[16] JavaCalls::call_helper(result = 0xc2cd1bb8, m = 0xc2cd1a50, args = 0xc2cd1b04, __the_thread__ = 0x94b750), line 347 in "javaCalls.cpp"
[17] os::os_exception_wrapper(f = 0xfdbd0c98 = &JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), value = 0xc2cd1bb8, method = 0xc2cd1a50, args = 0xc2cd1b04, thread = 0x94b750), line 1854 in "os_solaris.cpp"
[18] JavaCalls::call(result = 0xc2cd1bb8, method = CLASS, args = 0xc2cd1b04, __the_thread__ = 0x94b750), line 284 in "javaCalls.cpp"
[19] JavaCalls::call_virtual(result = 0xc2cd1bb8, spec_klass = CLASS, name = CLASS, signature = CLASS, args = 0xc2cd1b04, __the_thread__ = 0x94b750), line 180 in "javaCalls.cpp"
[20] JavaCalls::call_virtual(result = 0xc2cd1bb8, receiver = CLASS, spec_klass = CLASS, name = CLASS, signature = CLASS, __the_thread__ = 0x94b750), line 186 in "javaCalls.cpp"
[21] thread_entry(thread = 0x94b750, __the_thread__ = 0x94b750), line 1678 in "jvm.cpp"
[22] JavaThread::thread_main_inner(thread = 0x94b750), line 790 in "thread.cpp"
[23] JavaThread::thread_main(thread = 0x94b750), line 774 in "thread.cpp"
[24] _start(osthread = 0x94c290), line 514 in "os_solaris.cpp"
Here's what the "this" pointer in frame 10 of t@4 looks like:
*this = {
_type = _running
_thread = 0x94b750
_addr = {
_pc = 0xfa0032fc "\xc0"
_npc = 0xfa003300 ""
}
_handle = 0x94c250
_code_buffer = (nil)
_stop_state = _thread_in_Java
_stop_pc = 0xfa003300 ""
}
Note that _pc and _npc indicate the right values for frame 10 of t@2189:
current thread: t@2189
current frame: [10]
g0-g3 0x00000000 0x000397ac 0x0094b750 0x00000134
g4-g7 0xc2cd1518 0x00000000 0x00000000 0xc2cd1dc8
o0-o3 0xf7cca770 0x0000000e 0x00000000 0x0000000e
o4-o7 0x0094b750 0x0000000e 0xc2cd1520 0xfa006a3c
l0-l3 0xf7c004d0 0xd8532380 0xf7c98120 0x00000001
l4-l7 0xf7c98120 0xd85323a8 0xd85323a8 0xfffffff8
i0-i3 0xda0310e0 0x0000000e 0x0094b750 0xf7ce5830
i4-i7 0x0000000c 0x00000008 0xc2cd15c0 0xfa0156dc
y 0x00000000
ccr 0x00000004
pc 0xfa0032fc:0xfa0032fc ld [0x0], %g0
npc 0xfa003300:0xfa003300 illtrap 0x2
And here's what the code for frame 10 of t@2189 looks like:
0xfa003290: save %sp, -0x60, %sp
0xfa003294: st %sp, [%g2 + 0x8c]
0xfa003298: st %g0, [%g2 + 0x118]
0xfa00329c: mov %g2, %o0
0xfa0032a0: mov %g2, %l7
0xfa0032a4: call handle_illegal_instruction_exception
0xfa0032a8: nop
0xfa0032ac: mov %l7, %g2
0xfa0032b0: st %l2, [%sp + 0x8]
0xfa0032b4: save %sp, -0x60, %sp
0xfa0032b8: ld [%fp + 0x8], %l2
0xfa0032bc: ld [%g2 + 0x8c], %l0
0xfa0032c0: tst %l0
0xfa0032c4: te %icc,%g0 + 16
0xfa0032c8: restore
0xfa0032cc: st %g0, [%g2 + 0x8c]
0xfa0032d0: st %g0, [%g2 + 0x118]
0xfa0032d4: ld [%g2 + 0x4], %g5
0xfa0032d8: tst %g5
0xfa0032dc: be,pt %icc,0xfa0032f4
0xfa0032e0: nop
0xfa0032e4: restore
0xfa0032e8: sethi %hi(0xfe5d8400), %o7
0xfa0032ec: add %o7, 0xc0, %o7
0xfa0032f0: ld [0x0], %g0
0xfa0032f4: restore
0xfa0032f8: add %o7, 0x8, %o7
0xfa0032fc: ld [0x0], %g0
0xfa003300: illtrap 0x2
0xfa003304: illtrap 0x1
0xfa003308: illtrap 0x0
0xfa00330c: illtrap 0x0
0xfa003310: illtrap 0x4c
0xfa003314: illtrap 0x34
0xfa003318: illtrap 0x4
0xfa00331c: illtrap 0x40
0xfa003320: illtrap 0x4c
0xfa003324: illtrap 0x4c
Note that it appears that _npc is pointing _beyond_ a CodeBlob at the
header of the next blob.
Methinks that we should use _pc if _pc and _npc are contiguous to
avoid such problems. Indeed, if _pc and _npc are not contiguous
we should continue to use _npc to avoid from letting the target
thread escape into a method that we haven't patched. [Or even better
we could let the thread escape, and look at it again later, and
patch it -- taking care to avoid the measure zero case of no progress
where we always catch the target in a delayed control transfer).
As evidence of the fact that _npc is pointing just beyond a valid
CodeBlob, we find that:
in frame 9 or t@4, we have:
result = 0xfa003310
and *result = {
_size = 76
_header_size = 52
_relocation_size = 4
_instructions_offset = 64
_data_offset = 76
_oops_offset = 76
_oops_length = 0
_frame_size = 16
_return_address_offset = 15
_link_offset = -2
_oop_maps = 0x155008
}
This corresponds to _heap.find_start(_npc). Whereas _heap.find_start(_pc)
yields: `libjvm_g.so`codeCache.cpp`_heap.find_start((void *) 4194317052U) = 0xfa003250
where *(class CodeBlob *) 4194316880U = {
_size = 176
_header_size = 48
_relocation_size = 8
_instructions_offset = 64
_data_offset = 176
_oops_offset = 176
_oops_length = 0
_frame_size = 24
_return_address_offset = 15
_link_offset = -2
_oop_maps = (nil)
}
Not surprisingly: the end of the code blob = start of code blob for _pc + 176 =
_npc i.e. the _npc points just beyond the end of the actual CodeBlob
that we want!
I will need the active tutoring and assistance of a Compiler Safepoints
guru to try and fix (or even understand) this one. Clearly, the naive
solution of using the _pc will not work, because that can cause
looping of a compiled thread at a safepoint because we can just
end up patching caller rather than callee (and if the callee has an
infinite loop, we'll keep looping till the cows come home, and beyond).
y.s.ramakrishna@eng 2000-04-05
==============================================
running Volano with a very high connection count on a large MP
machine: In the specific case of this report, we were running
with 1100 connections (only) on a 10 processor SPARC (the assert
fired in the client; we were running with Compiler2, but i believe
that may just be a coincoidence):
The VM thread is in the process of progress towards a safepoint,
and asserts trying to examine the state of a (suspended) thread
that's executing compiled code.
Here's the VM thread:
current thread: t@4
[1] __sigprocmask(0x0, 0xfee0abf0, 0x0, 0xffffffff, 0xffffffff, 0x0), at 0xff379da0
[2] _resetsig(0xff38ca04, 0x0, 0xfee0bdc8, 0x0, 0x0, 0xfee0be44), at 0xff36f3f8
[3] _sigon(0xff394358, 0xff3942d8, 0xfee0be3c, 0xfee0acc4, 0x6, 0xff2cd00c), at 0xff36eb44
[4] _thrp_kill(0x0, 0x4, 0x6, 0xff38ca04, 0xfee0bdc8, 0xff393c5c), at 0xff371a20
[5] abort(0xff333968, 0xfee0adc0, 0x0, 0xfffffff8, 0x0, 0xfee0ade1), at 0xff2b9468
[6] os::abort(), line 1150 in "os_solaris.cpp"
[7] report_error(is_vm_internal_error = 1, file_name = 0xfe5190c4 "/export/sail1/jdk1.3_b09/hotsparc-1.3-b09/src/share/vm/code/codeCache.cpp", line_no = 193, title = 0xfe421c6c "assertion failure", format = 0xfe421c80 "assert(%s, "%s")", ...), line 322 in "debug.cpp"
[8] report_assertion_failure(code_str = 0xfe51909c "result->blob_contains((address)start)", file_name = 0xfe5190c4 "/export/sail1/jdk1.3_b09/hotsparc-1.3-b09/src/share/vm/code/codeCache.cpp", line_no = 193, message = 0xfe519110 "found wrong CodeBlob"), line 145 in "debug.cpp"
[9] CodeCache::find_blob(start = 0xfa003300), line 193 in "codeCache.cpp"
=>[10] ThreadSafepointState::examine_state_of_thread(this = 0x94c1f8, iterations = 0), line 502 in "safepoint.cpp"
[11] SafepointSynchronize::begin(), line 75 in "safepoint.cpp"
[12] VMThread::loop(this = 0x155ca0), line 269 in "vmThread.cpp"
[13] VMThread::vm_thread_main(notify_lock = 0x28f78), line 162 in "vmThread.cpp"
[14] _start(osthread = 0x155f38), line 514 in "os_solaris.cpp"
And here's the target thread (the one whose thread state is being "examined"):
current thread: t@2189
[1] __lwp_sema_wait(0xc2cd1e78, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff317aec
[2] _park(0xc2cd1dc8, 0xc2cd1e78, 0x0, 0xf, 0xff38d798, 0xc4311dc8), at 0xff36b1e4
[3] _swtch(0x5, 0xff38ca04, 0xc2cd1e58, 0xc2cd1e54, 0xc2cd1e50, 0xc2cd1e4c), at 0xff36aed8
[4] _dopreempt(0x0, 0xff38ca04, 0x1, 0x0, 0x0, 0x0), at 0xff36ce10
[5] _sigon(0x1, 0xc2cd1e2c, 0xc2cd0f88, 0x0, 0x0, 0x0), at 0xff36eaf0
[6] thr_sigsetmask(0xc2cd1e54, 0xc2cd1030, 0x8000, 0xff38ca04, 0xc2cd1e4c, 0xc2cd1e2c), at 0xff371244
[7] __libthread_segvhdlr(0xb, 0xc2cd14a0, 0xc2cd11e8, 0xff38ca04, 0x0, 0x0), at 0xff379344
[8] __sighndlr(0xb, 0xc2cd14a0, 0xc2cd11e8, 0xff379278, 0xc2cd1e4c, 0xc2cd1e2c), at 0xff37be00
[9] sigacthandler(0xb, 0xc2cd14a0, 0xc2cd11e8, 0xff38ca04, 0x2c, 0xc2cd1e5c), at 0xff378708
---- called from signal handler with signal 11 (SIGSEGV) ------
[10] 0xfa0032fc(0xda0310e0, 0xe, 0x94b750, 0xf7ce5830, 0xc, 0x8), at 0xfa0032fb
[11] 0xfa0156dc(0xda030f48, 0xf7cd10b8, 0xd8532120, 0xda030fe0, 0x0, 0x0), at 0xfa0156db
[12] 0xfa016794(0xf7c38d00, 0x1, 0x0, 0x0, 0x0, 0x0), at 0xfa016793
[13] 0xfa014a3c(0xd7c6cbe0, 0xf7c16d58, 0xc2cd17dc, 0x4, 0xbfc00, 0x0), at 0xfa014a3b
[14] 0xc93c8(0xc2cd17dc, 0x94b750, 0x8, 0xe3da8, 0x4, 0xc2cd1708), at 0xc93c7
[15] 0xfe5d8974(0xc2cd1874, 0xc2cd1bc0, 0xa, 0xf7c176d8, 0x4, 0xc2cd1780), at 0xfe5d8973
[16] JavaCalls::call_helper(result = 0xc2cd1bb8, m = 0xc2cd1a50, args = 0xc2cd1b04, __the_thread__ = 0x94b750), line 347 in "javaCalls.cpp"
[17] os::os_exception_wrapper(f = 0xfdbd0c98 = &JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), value = 0xc2cd1bb8, method = 0xc2cd1a50, args = 0xc2cd1b04, thread = 0x94b750), line 1854 in "os_solaris.cpp"
[18] JavaCalls::call(result = 0xc2cd1bb8, method = CLASS, args = 0xc2cd1b04, __the_thread__ = 0x94b750), line 284 in "javaCalls.cpp"
[19] JavaCalls::call_virtual(result = 0xc2cd1bb8, spec_klass = CLASS, name = CLASS, signature = CLASS, args = 0xc2cd1b04, __the_thread__ = 0x94b750), line 180 in "javaCalls.cpp"
[20] JavaCalls::call_virtual(result = 0xc2cd1bb8, receiver = CLASS, spec_klass = CLASS, name = CLASS, signature = CLASS, __the_thread__ = 0x94b750), line 186 in "javaCalls.cpp"
[21] thread_entry(thread = 0x94b750, __the_thread__ = 0x94b750), line 1678 in "jvm.cpp"
[22] JavaThread::thread_main_inner(thread = 0x94b750), line 790 in "thread.cpp"
[23] JavaThread::thread_main(thread = 0x94b750), line 774 in "thread.cpp"
[24] _start(osthread = 0x94c290), line 514 in "os_solaris.cpp"
Here's what the "this" pointer in frame 10 of t@4 looks like:
*this = {
_type = _running
_thread = 0x94b750
_addr = {
_pc = 0xfa0032fc "\xc0"
_npc = 0xfa003300 ""
}
_handle = 0x94c250
_code_buffer = (nil)
_stop_state = _thread_in_Java
_stop_pc = 0xfa003300 ""
}
Note that _pc and _npc indicate the right values for frame 10 of t@2189:
current thread: t@2189
current frame: [10]
g0-g3 0x00000000 0x000397ac 0x0094b750 0x00000134
g4-g7 0xc2cd1518 0x00000000 0x00000000 0xc2cd1dc8
o0-o3 0xf7cca770 0x0000000e 0x00000000 0x0000000e
o4-o7 0x0094b750 0x0000000e 0xc2cd1520 0xfa006a3c
l0-l3 0xf7c004d0 0xd8532380 0xf7c98120 0x00000001
l4-l7 0xf7c98120 0xd85323a8 0xd85323a8 0xfffffff8
i0-i3 0xda0310e0 0x0000000e 0x0094b750 0xf7ce5830
i4-i7 0x0000000c 0x00000008 0xc2cd15c0 0xfa0156dc
y 0x00000000
ccr 0x00000004
pc 0xfa0032fc:0xfa0032fc ld [0x0], %g0
npc 0xfa003300:0xfa003300 illtrap 0x2
And here's what the code for frame 10 of t@2189 looks like:
0xfa003290: save %sp, -0x60, %sp
0xfa003294: st %sp, [%g2 + 0x8c]
0xfa003298: st %g0, [%g2 + 0x118]
0xfa00329c: mov %g2, %o0
0xfa0032a0: mov %g2, %l7
0xfa0032a4: call handle_illegal_instruction_exception
0xfa0032a8: nop
0xfa0032ac: mov %l7, %g2
0xfa0032b0: st %l2, [%sp + 0x8]
0xfa0032b4: save %sp, -0x60, %sp
0xfa0032b8: ld [%fp + 0x8], %l2
0xfa0032bc: ld [%g2 + 0x8c], %l0
0xfa0032c0: tst %l0
0xfa0032c4: te %icc,%g0 + 16
0xfa0032c8: restore
0xfa0032cc: st %g0, [%g2 + 0x8c]
0xfa0032d0: st %g0, [%g2 + 0x118]
0xfa0032d4: ld [%g2 + 0x4], %g5
0xfa0032d8: tst %g5
0xfa0032dc: be,pt %icc,0xfa0032f4
0xfa0032e0: nop
0xfa0032e4: restore
0xfa0032e8: sethi %hi(0xfe5d8400), %o7
0xfa0032ec: add %o7, 0xc0, %o7
0xfa0032f0: ld [0x0], %g0
0xfa0032f4: restore
0xfa0032f8: add %o7, 0x8, %o7
0xfa0032fc: ld [0x0], %g0
0xfa003300: illtrap 0x2
0xfa003304: illtrap 0x1
0xfa003308: illtrap 0x0
0xfa00330c: illtrap 0x0
0xfa003310: illtrap 0x4c
0xfa003314: illtrap 0x34
0xfa003318: illtrap 0x4
0xfa00331c: illtrap 0x40
0xfa003320: illtrap 0x4c
0xfa003324: illtrap 0x4c
Note that it appears that _npc is pointing _beyond_ a CodeBlob at the
header of the next blob.
Methinks that we should use _pc if _pc and _npc are contiguous to
avoid such problems. Indeed, if _pc and _npc are not contiguous
we should continue to use _npc to avoid from letting the target
thread escape into a method that we haven't patched. [Or even better
we could let the thread escape, and look at it again later, and
patch it -- taking care to avoid the measure zero case of no progress
where we always catch the target in a delayed control transfer).
As evidence of the fact that _npc is pointing just beyond a valid
CodeBlob, we find that:
in frame 9 or t@4, we have:
result = 0xfa003310
and *result = {
_size = 76
_header_size = 52
_relocation_size = 4
_instructions_offset = 64
_data_offset = 76
_oops_offset = 76
_oops_length = 0
_frame_size = 16
_return_address_offset = 15
_link_offset = -2
_oop_maps = 0x155008
}
This corresponds to _heap.find_start(_npc). Whereas _heap.find_start(_pc)
yields: `libjvm_g.so`codeCache.cpp`_heap.find_start((void *) 4194317052U) = 0xfa003250
where *(class CodeBlob *) 4194316880U = {
_size = 176
_header_size = 48
_relocation_size = 8
_instructions_offset = 64
_data_offset = 176
_oops_offset = 176
_oops_length = 0
_frame_size = 24
_return_address_offset = 15
_link_offset = -2
_oop_maps = (nil)
}
Not surprisingly: the end of the code blob = start of code blob for _pc + 176 =
_npc i.e. the _npc points just beyond the end of the actual CodeBlob
that we want!
I will need the active tutoring and assistance of a Compiler Safepoints
guru to try and fix (or even understand) this one. Clearly, the naive
solution of using the _pc will not work, because that can cause
looping of a compiled thread at a safepoint because we can just
end up patching caller rather than callee (and if the callee has an
infinite loop, we'll keep looping till the cows come home, and beyond).
y.s.ramakrishna@eng 2000-04-05
==============================================
- duplicates
-
JDK-4325983 JWS fails with base_loc=NULL in OopMapSet::all_do.
-
- Resolved
-
- relates to
-
JDK-4325735 Java Webserver run ends with assertion failure "found wrong CodeBlob".
-
- Closed
-