Please see also 4650839.
Sometimes VM couldn't grab the Threads_lock, Heap_lock or SystemDictionary_lock
and that would cause vmark hang on the client side.
To reproduce:
> java COM.volano.Main
> repeat 1000 java COM.volano.Mark -count 1
It hangs pretty quick (usually in less than 500 COM.volano.Mark runs) on
Redhat 7.2 SMP with product builds. I wasn't able to reproduce the hang
using Redhat 6.2 SMP or debug builds.
The CPU usage is 0% when VM hangs. It appears that VM couldn't grab one of
the important system locks (Threads_lock, Heap_lock or SystemDictionary_lock)
with pthread_mutex_lock() call. However, the _owner field of the lock is 0x0.
Looking into the pthread frames that handle the underlying pthread mutex, the
mutex status is non-zero, implying that it is indeed locked by some thread.
By default, LinuxThreads doesn't record the real owner of a mutex unless
the type of mutex is initialized to PTHREAD_MUTEX_ERRORCHECK_NP. I managed
to reproduce the hang with PTHREAD_MUTEX_ERRORCHECK_NP type mutex, the _owner
field of the pthread mutex is again 0x0. See the following stack trace:
#0 0x40075aa5 in __sigsuspend (set=0x4c5b10c0)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1 0x40037079 in __pthread_wait_for_restart_signal (self=0x4c5b1be0)
at pthread.c:967
#2 0x40038d39 in __pthread_alt_lock (lock=0x805069c, self=0x4c5b1be0)
at restart.h:34
#3 0x40035c6e in __pthread_mutex_lock (mutex=0x805068c) at mutex.c:116
#4 0x4040be1f in os::Linux::safe_mutex_lock (_mutex=0x805068c)
at /home/huanghui/main/build/linux/../../src/os_cpu/linux_i486/vm/os_linux_i486.cpp:518
#5 0x40470589 in os::Linux::Event::lock (this=0x8050688)
at /home/huanghui/main/build/linux/../../src/os/linux/vm/os_linux.hpp:137
#6 0x404703ed in Mutex::wait_for_lock_implementation (this=0x8050660)
at /home/huanghui/main/build/linux/../../src/os/linux/vm/mutex_linux.inline.hpp:25
#7 0x403fbc1b in Mutex::wait_for_lock_blocking_implementation (
this=0x8050660, thread=0x807bf30)
at /home/huanghui/main/build/linux/../../src/os/linux/vm/mutex_linux.cpp:89
#8 0x403fae61 in Mutex::lock (this=0x8050660)
at /home/huanghui/main/build/linux/../../src/share/vm/runtime/mutex.cpp:42
#9 0x4042aab4 in SystemDictionary::find ()
at /home/huanghui/main/build/linux/../../src/share/vm/runtime/safepoint.hpp:230
#10 0x4042ac1f in SystemDictionary::find_instance_or_array_klass ()
at /home/huanghui/main/build/linux/../../src/share/vm/runtime/safepoint.hpp:230
#11 0x402f7dfb in ciEnv::get_klass_by_name_impl ()
from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
#12 0x402f8251 in ciEnv::get_klass_by_index_impl ()
from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
#13 0x402f82ef in ciEnv::get_klass_by_index ()
from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
... ... ... ...
(gdb) frame 4
#4 0x4040be1f in os::Linux::safe_mutex_lock (_mutex=0x805068c)
at /home/huanghui/main/build/linux/../../src/os_cpu/linux_i486/vm/os_linux_i486.cpp:518
518 int status = pthread_mutex_lock(_mutex);
Current language: auto; currently c++
(gdb) p *_mutex
$7 = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 2,
__m_lock = {__status = 1473963040, __spinlock = 0}}
>>>> __m_kind == PTHREAD_MUTEX_ERRORCHECK_NP, __m_owner = 0x0 <<<<
(gdb) frame 8
#8 0x403fae61 in Mutex::lock (this=0x8050660)
at /home/huanghui/main/build/linux/../../src/share/vm/runtime/mutex.cpp:42
42 wait_for_lock_blocking_implementation((JavaThread*)thread);
(gdb) p *this
$8 = {<CHeapObj> = {<No data fields>}, _lock_count = 0, _lock_event = 0x8050688,
_supress_signal = 0, _owner = 0x0,
_name = 0x4058a636 "SystemDictionary_lock", static INVALID_THREAD = 0x0}
Note from "$7" that __m_kind = 2, which is PTHREAD_MUTEX_ERRORCHECK_NP.
__m_lock.__status is not 0 or 1, but __m_owner == 0x0.
"$8" shows that the (HotSpot) _owner field of SystemDictionary_lock is 0x0.
Sometimes VM couldn't grab the Threads_lock, Heap_lock or SystemDictionary_lock
and that would cause vmark hang on the client side.
To reproduce:
> java COM.volano.Main
> repeat 1000 java COM.volano.Mark -count 1
It hangs pretty quick (usually in less than 500 COM.volano.Mark runs) on
Redhat 7.2 SMP with product builds. I wasn't able to reproduce the hang
using Redhat 6.2 SMP or debug builds.
The CPU usage is 0% when VM hangs. It appears that VM couldn't grab one of
the important system locks (Threads_lock, Heap_lock or SystemDictionary_lock)
with pthread_mutex_lock() call. However, the _owner field of the lock is 0x0.
Looking into the pthread frames that handle the underlying pthread mutex, the
mutex status is non-zero, implying that it is indeed locked by some thread.
By default, LinuxThreads doesn't record the real owner of a mutex unless
the type of mutex is initialized to PTHREAD_MUTEX_ERRORCHECK_NP. I managed
to reproduce the hang with PTHREAD_MUTEX_ERRORCHECK_NP type mutex, the _owner
field of the pthread mutex is again 0x0. See the following stack trace:
#0 0x40075aa5 in __sigsuspend (set=0x4c5b10c0)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1 0x40037079 in __pthread_wait_for_restart_signal (self=0x4c5b1be0)
at pthread.c:967
#2 0x40038d39 in __pthread_alt_lock (lock=0x805069c, self=0x4c5b1be0)
at restart.h:34
#3 0x40035c6e in __pthread_mutex_lock (mutex=0x805068c) at mutex.c:116
#4 0x4040be1f in os::Linux::safe_mutex_lock (_mutex=0x805068c)
at /home/huanghui/main/build/linux/../../src/os_cpu/linux_i486/vm/os_linux_i486.cpp:518
#5 0x40470589 in os::Linux::Event::lock (this=0x8050688)
at /home/huanghui/main/build/linux/../../src/os/linux/vm/os_linux.hpp:137
#6 0x404703ed in Mutex::wait_for_lock_implementation (this=0x8050660)
at /home/huanghui/main/build/linux/../../src/os/linux/vm/mutex_linux.inline.hpp:25
#7 0x403fbc1b in Mutex::wait_for_lock_blocking_implementation (
this=0x8050660, thread=0x807bf30)
at /home/huanghui/main/build/linux/../../src/os/linux/vm/mutex_linux.cpp:89
#8 0x403fae61 in Mutex::lock (this=0x8050660)
at /home/huanghui/main/build/linux/../../src/share/vm/runtime/mutex.cpp:42
#9 0x4042aab4 in SystemDictionary::find ()
at /home/huanghui/main/build/linux/../../src/share/vm/runtime/safepoint.hpp:230
#10 0x4042ac1f in SystemDictionary::find_instance_or_array_klass ()
at /home/huanghui/main/build/linux/../../src/share/vm/runtime/safepoint.hpp:230
#11 0x402f7dfb in ciEnv::get_klass_by_name_impl ()
from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
#12 0x402f8251 in ciEnv::get_klass_by_index_impl ()
from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
#13 0x402f82ef in ciEnv::get_klass_by_index ()
from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
... ... ... ...
(gdb) frame 4
#4 0x4040be1f in os::Linux::safe_mutex_lock (_mutex=0x805068c)
at /home/huanghui/main/build/linux/../../src/os_cpu/linux_i486/vm/os_linux_i486.cpp:518
518 int status = pthread_mutex_lock(_mutex);
Current language: auto; currently c++
(gdb) p *_mutex
$7 = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 2,
__m_lock = {__status = 1473963040, __spinlock = 0}}
>>>> __m_kind == PTHREAD_MUTEX_ERRORCHECK_NP, __m_owner = 0x0 <<<<
(gdb) frame 8
#8 0x403fae61 in Mutex::lock (this=0x8050660)
at /home/huanghui/main/build/linux/../../src/share/vm/runtime/mutex.cpp:42
42 wait_for_lock_blocking_implementation((JavaThread*)thread);
(gdb) p *this
$8 = {<CHeapObj> = {<No data fields>}, _lock_count = 0, _lock_event = 0x8050688,
_supress_signal = 0, _owner = 0x0,
_name = 0x4058a636 "SystemDictionary_lock", static INVALID_THREAD = 0x0}
Note from "$7" that __m_kind = 2, which is PTHREAD_MUTEX_ERRORCHECK_NP.
__m_lock.__status is not 0 or 1, but __m_owner == 0x0.
"$8" shows that the (HotSpot) _owner field of SystemDictionary_lock is 0x0.
- duplicates
-
JDK-4650839 RAS: Vtest hang after 38 hrs 19 mins in hopper_04 c1 on linux redhat 7.1
- Closed