VTest hang after 25 hours on machine jtg-e450.sfbay with Merlin build 88
the vm thread is stuck on the malloc lock
and several threads have been interrupted by sigusr1 while doing a malloc.
current thread: t@4
=>[1] __lwp_sema_wait(0xfc781e30, 0x0, 0x0, 0x0, 0x0, 0x2), at 0xff319c64
[2] _park(0xfc781e30, 0xff38e000, 0x0, 0xfc781d78, 0xfe422324, 0x0), at 0xff36
97f4
[3] _swtch(0xfc781d78, 0xfc781d78, 0xff38e000, 0x5, 0x0, 0x0), at 0xff369204
[4] _mutex_adaptive_lock(0xff399944, 0x66666400, 0x4c00, 0x1, 0x4d58, 0xfffeff
ff), at 0xff36ad80
[5] _cmutex_lock(0xff33a500, 0xff, 0xff38e000, 0xff2c0f2c, 0x204, 0x0), at 0xf
f36ab1c
[6] malloc(0x14, 0x95740, 0x35c7d4, 0x0, 0xfe3e2000, 0xfc781470), at 0xff2c0f2
c
[7] os::malloc(0x14, 0x9ee50, 0x7, 0x0, 0xfc78189c, 0xfc781518), at 0xfe049740
[8] CHeapObj::operator new(0x14, 0x27c160, 0x1fc890, 0x6, 0x290e, 0xfc781970),
at 0xfe0496e0
[9] CompiledCodeSafepointHandler::setup(0xfa4a44c0, 0x39c, 0xfa4a4388, 0x0, 0x
0, 0x0), at 0xfe1e5aac
[10] ThreadSafepointState::examine_state_of_thread(0x24c150, 0x0, 0xffffffff,
0xfe42bc28, 0xfe422324, 0xfe1a3528), at 0xfe1a3e60
[11] SafepointSynchronize::begin(0x5000, 0x51c4, 0x27c160, 0x24c150, 0xfe0fc6f
4, 0x0), at 0xfe1a3580
[12] VMThread::loop(0xfe407fec, 0xfe3f83e0, 0xfe3f83dc, 0x0, 0x0, 0x0), at 0xf
e0fc74c
[13] VMThread::run(0x9edc8, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfe0fc25c
[14] _start(0x9edc8, 0xff38f6a0, 0x1, 0x1, 0xff38e000, 0x0), at 0xfe0fc16c
current thread: t@460
[1] __lwp_sema_wait(0xd8881e30, 0x0, 0x0, 0x1, 0x0, 0x2), at 0xff319c64
[2] _park(0xd8881e30, 0xff38e000, 0x0, 0xd8881d78, 0x0, 0x0), at 0xff3697f4
[3] _swtch(0xd8881d78, 0xd8881d78, 0xff38e000, 0x5, 0xd8881d78, 0xd8881200), a
t 0xff369204
[4] _mutex_adaptive_lock(0xff399944, 0x66666400, 0x4c00, 0x1, 0x4d58, 0xfffeff
ff), at 0xff36ad80
[5] _cmutex_lock(0xff33a500, 0xff, 0x0, 0xff2c1dc0, 0x0, 0x0), at 0xff36ab1c
[6] free(0x1830d8, 0x4800, 0xfe3e2000, 0xfe3f8194, 0x2a380, 0x0), at 0xff2c1dc
0
[7] Thread_Interrupt_Callback::execute(0x1830d8, 0xd8880da8, 0xff38e000, 0x1,
0x0, 0x0), at 0xfe2e22b8
[8] OSThread::do_interrupt_callbacks_at_interrupt(0x268ae8, 0xd8880da8, 0x0, 0
x240100, 0xfe1e62a4, 0x0), at 0xfe1e6b78
[9] JVM_handle_solaris_signal(0x10, 0xd8881200, 0xd8880f48, 0x1, 0x0, 0x0), at
0xfe1e6314
[10] __sighndlr(0x10, 0xd8881200, 0xd8880f48, 0xfe1e6240, 0xd8881e10, 0xd8881e
00), at 0xff37bd04
[11] sigacthandler(0x10, 0xd8881d78, 0xd8880f48, 0xff38e000, 0xd8881d78, 0xd88
81200), at 0xff378508
---- called from signal handler with signal 16 (SIGUSR1) ------
[12] _mutex_adaptive_lock(0xff399944, 0x66666400, 0x4c00, 0x1, 0x4d58, 0xfffef
fff), at 0xff36ada4
[13] _cmutex_lock(0xff33a500, 0xff, 0xfe4, 0xff2c0f2c, 0x19a, 0xfa4dcf7c), at
0xff36ab1c
[14] malloc(0x8c, 0x0, 0x19f, 0xfa4dcf7c, 0xfe3e2000, 0x0), at 0xff2c0f2c
[15] os::malloc(0x8c, 0xfa4dc848, 0x31ea58, 0xd8881858, 0x2e2a38, 0xfe10ae00),
at 0xfe049740
[16] CHeapObj::operator new(0x8c, 0xfe3e2000, 0x0, 0x2686dc, 0xfe3e2000, 0xfa4
43370), at 0xfe0496e0
[17] nmethod::add_handler_for_exception_and_pc(0xfa4dc848, 0xd8881540, 0xfa4dc
a30, 0xfa4dcb10, 0x268110, 0xee0a3e48), at 0xfe1ae30c
[18] Runtime1::exception_handler_for_pc(0xfe4009fc, 0x2686a0, 0xfa4dca30, 0xe,
Date: Fri, 30 Nov 2001 14:40:03 -0500 (EST)
From: Karen Kinnear - Sun PC Distributed Systems <###@###.###>
Subject: Re: New Volano Hang
To: ###@###.###, ###@###.###
Cc: ###@###.###, ###@###.###, ###@###.###, ###@###.###
MIME-Version: 1.0
Content-MD5: EsSXBTLHb1pIPLxK2NEDIQ==
Coleen -
Dice pointed out a potential bug here when he reviewed my
latest change (not caused by that latest change).
The thread getting the SIGUSR1 is deadlocking itself I believe,
i.e. I think it has the malloc lock and is now servicing an interrupt
callback.
The Thread_Interrupt_Callback::execute call actually
tries to free the callback by calling
"delete this"
There is a comment saying there is not an issue with deadlock with
a malloc lock, but I'm not sure why that would be true.
There are a number of ways to fix this. When this no longer
longjmps I will make this synchronous and have the caller do
the freeing.
Steve would know better - I think there used to be a list
of cancelled requests - perhaps we could have a list of
completed requests. Alternatively we could add a bit or
something to mark the request as completed. Steve would know
what code would delete it later (perhaps the next synchronous
call that is deleting a request for this thread and already
removing its own completed request).
hope this helps,
Karen
###@###.### 2001-11-30