Customer Problem Description:
We have an Java server application that is crashing on a customer site.
In a time period anywhere between 1 to 30 hours, the JVM will crash with an error. Crashes seem thread-related, and we've seen the crash in a
pthread_detach call.
An unexpected exception has been detected in native code outside the VM.
Unexpected Signal : 11 occurred at PC=0x4002cba0
Function name=(N/A)
Library=/lib/i686/libpthread.so.0
NOTE: We are unable to locate the function name symbol for the error
just occurred. Please refer to release documentation for possible
reason and solutions.
Current Java thread:
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:942)
at com.fidelia.emerald.tasks.Task.execute(Task.java:214)
at
com.fidelia.emerald.monitor.AggregatedDataDbWriter.getAggregatedDataInterval
s(AggregatedDataDbWriter.java:152)
at
com.fidelia.emerald.monitor.AggregatedDataDbWriter.writeObjects(AggregatedDa
taDbWriter.java:197)
at
com.fidelia.emerald.monitor.AggregationDbWriter.run(AggregationDbWriter.java
:124)
Running with ShowMessageBoxOnError:
------------------------------------
Our application hung overnight, so I was able to attach to it with GDB.
Thread 2 was the one with the os::message_box call. Note that the top
of the stack is in "nanosleep", not "read".
(gdb) thread 2
[Switching to thread 2 (Thread 2049 (LWP 15445))]#0 0x403ff8e1 in
__libc_nanosleep
() from /lib/i686/libc.so.6
(gdb) where
#0 0x403ff8e1 in __libc_nanosleep () from /lib/i686/libc.so.6
#1 0x403ff761 in __sleep (seconds=100) at
../sysdeps/unix/sysv/linux/sleep.c:85
#2 0x401846a7 in os::message_box () at eval.c:41
#3 0x401824d6 in os::handle_unexpected_exception () at eval.c:41
#4 0x40185c02 in JVM_handle_linux_signal () at eval.c:41
#5 0x40184a54 in signalHandler () at eval.c:41
#6 0x400269c3 in pthread_sighandler_rt (signo=11, si=0xbf80dc90,
uc=0xbf80dd10)
at signals.c:121
#7 <signal handler called>
#8 pthread_allocate_stack (attr=0x5fa896a0, default_new_thread=0x0,
pagesize=4096,
out_new_thread=0x80a01d8, out_new_thread_bottom=0x80a01dc,
out_guardaddr=0x80a01e0, out_guardsize=0x80a01e4)
at ../sysdeps/i386/i486/bits/string.h:316
#9 0x40023ec7 in pthread_handle_create (thread=0x5fa89c70,
attr=0x5fa896a0,
start_routine=0x40182e78 <_start(Thread *)>, arg=0x75f08e68,
mask=0x80a0250,
father_pid=15587, report_events=0, event_maskp=0x5fa89dcc) at
manager.c:492
#10 0x400239c5 in __pthread_manager (arg=0x3) at manager.c:154
Running with java_g:
---------------------
The application on the single-processor machine has been running almost
a week now, but the app on the SMP machine crashed yesterday with the
usual pthread_detach error, after about 40 hours of uptime. I
installed the debug package over the top of jdk1.3.1_03/, and I'm trying
to use java_g now,
I got a crash running java_g. This is what showed up on standard error:
# HotSpot Virtual Machine Error, assertion failure
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# assert(_is_owned == v_false, "mutex_lock should not have had owner")
#
# Error ID:
/BUILD_AREA/hotspot1.3.1/build/linux/../../src/os/linux/vm/os_linux.hpp, 256
#
# Problematic Thread: prio=1 tid=0x83aae18 nid=0x41c0 waiting on monitor
#
assertion failure
assert(_is_owned == v_false, "mutex_lock should not have had owner")
Do you want to debug the problem?
# SafepointSynchronize::begin: Fatal error:
# SafepointSynchronize::begin: Timed out while attempting to reach a
safepoint.
# SafepointSynchronize::begin: Threads which did not reach the safepoint:
# nid=0x6b9d initialized
# SafepointSynchronize::begin: (End of list)
Since I was running with "showmessagebox", I was able to attach to the
process and get stack traces for all 23 threads, which I've attached
to the bug report. (threadstack.txt)
Although there are 88 jvm processes in ps, I could only get stack
traces for 23 threads through gdb.
I double checked the command
line for the hung process using "ps", and we are using 1.3.1_03:
/usr/local/jdk1.3.1_03/bin/i386/native_threads/java_g -server -Xmx256m
-XX:+ShowMessageBoxOnError
and it is on an SMP machine:
[lreeder@dge01 netvigil]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 927.163
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse
bogomips : 1848.11
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 927.163
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse
bogomips : 1848.11
We have an Java server application that is crashing on a customer site.
In a time period anywhere between 1 to 30 hours, the JVM will crash with an error. Crashes seem thread-related, and we've seen the crash in a
pthread_detach call.
An unexpected exception has been detected in native code outside the VM.
Unexpected Signal : 11 occurred at PC=0x4002cba0
Function name=(N/A)
Library=/lib/i686/libpthread.so.0
NOTE: We are unable to locate the function name symbol for the error
just occurred. Please refer to release documentation for possible
reason and solutions.
Current Java thread:
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:942)
at com.fidelia.emerald.tasks.Task.execute(Task.java:214)
at
com.fidelia.emerald.monitor.AggregatedDataDbWriter.getAggregatedDataInterval
s(AggregatedDataDbWriter.java:152)
at
com.fidelia.emerald.monitor.AggregatedDataDbWriter.writeObjects(AggregatedDa
taDbWriter.java:197)
at
com.fidelia.emerald.monitor.AggregationDbWriter.run(AggregationDbWriter.java
:124)
Running with ShowMessageBoxOnError:
------------------------------------
Our application hung overnight, so I was able to attach to it with GDB.
Thread 2 was the one with the os::message_box call. Note that the top
of the stack is in "nanosleep", not "read".
(gdb) thread 2
[Switching to thread 2 (Thread 2049 (LWP 15445))]#0 0x403ff8e1 in
__libc_nanosleep
() from /lib/i686/libc.so.6
(gdb) where
#0 0x403ff8e1 in __libc_nanosleep () from /lib/i686/libc.so.6
#1 0x403ff761 in __sleep (seconds=100) at
../sysdeps/unix/sysv/linux/sleep.c:85
#2 0x401846a7 in os::message_box () at eval.c:41
#3 0x401824d6 in os::handle_unexpected_exception () at eval.c:41
#4 0x40185c02 in JVM_handle_linux_signal () at eval.c:41
#5 0x40184a54 in signalHandler () at eval.c:41
#6 0x400269c3 in pthread_sighandler_rt (signo=11, si=0xbf80dc90,
uc=0xbf80dd10)
at signals.c:121
#7 <signal handler called>
#8 pthread_allocate_stack (attr=0x5fa896a0, default_new_thread=0x0,
pagesize=4096,
out_new_thread=0x80a01d8, out_new_thread_bottom=0x80a01dc,
out_guardaddr=0x80a01e0, out_guardsize=0x80a01e4)
at ../sysdeps/i386/i486/bits/string.h:316
#9 0x40023ec7 in pthread_handle_create (thread=0x5fa89c70,
attr=0x5fa896a0,
start_routine=0x40182e78 <_start(Thread *)>, arg=0x75f08e68,
mask=0x80a0250,
father_pid=15587, report_events=0, event_maskp=0x5fa89dcc) at
manager.c:492
#10 0x400239c5 in __pthread_manager (arg=0x3) at manager.c:154
Running with java_g:
---------------------
The application on the single-processor machine has been running almost
a week now, but the app on the SMP machine crashed yesterday with the
usual pthread_detach error, after about 40 hours of uptime. I
installed the debug package over the top of jdk1.3.1_03/, and I'm trying
to use java_g now,
I got a crash running java_g. This is what showed up on standard error:
# HotSpot Virtual Machine Error, assertion failure
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# assert(_is_owned == v_false, "mutex_lock should not have had owner")
#
# Error ID:
/BUILD_AREA/hotspot1.3.1/build/linux/../../src/os/linux/vm/os_linux.hpp, 256
#
# Problematic Thread: prio=1 tid=0x83aae18 nid=0x41c0 waiting on monitor
#
assertion failure
assert(_is_owned == v_false, "mutex_lock should not have had owner")
Do you want to debug the problem?
# SafepointSynchronize::begin: Fatal error:
# SafepointSynchronize::begin: Timed out while attempting to reach a
safepoint.
# SafepointSynchronize::begin: Threads which did not reach the safepoint:
# nid=0x6b9d initialized
# SafepointSynchronize::begin: (End of list)
Since I was running with "showmessagebox", I was able to attach to the
process and get stack traces for all 23 threads, which I've attached
to the bug report. (threadstack.txt)
Although there are 88 jvm processes in ps, I could only get stack
traces for 23 threads through gdb.
I double checked the command
line for the hung process using "ps", and we are using 1.3.1_03:
/usr/local/jdk1.3.1_03/bin/i386/native_threads/java_g -server -Xmx256m
-XX:+ShowMessageBoxOnError
and it is on an SMP machine:
[lreeder@dge01 netvigil]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 927.163
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse
bogomips : 1848.11
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 927.163
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse
bogomips : 1848.11
- duplicates
-
JDK-4711785 RAS: Vtest server hang after 5 days with hopper_16 c1 on linux redhat 7.2
- Closed