CU reported a VM crash problem.
A SIGSEGV occurs when GetMethodID() is called.
CONFIGURATION:
OS : RHEL 4 (Itanium)
JDK: 1.4.2_11
INVESTIGATION :
The followings are the information of signal and stack trace.
-------
Unexpected Signal : SIGSEGV [0xb] occurred at PC=0x2000000000a9a2e1, pid=28651, nid=2305843014325498496
siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x005f00320036006f
#0 0xa000000000010640 in __kernel_syscall_via_break ()
#1 0x20000000000f7630 in raise ()
#2 0x20000000000fa010 in abort ()
#3 0x2000000000da9bb0 in os::abort ()
#4 0x2000000000f7f5d0 in VMError::report_and_die ()
#5 0x2000000000db3cc0 in JVM_handle_linux_signal ()
#6 0x2000000000dadc20 in signalHandler ()
#7 <signal handler called>
#8 0x2000000000a9a2e0 in JNIid::find ()
#9 0x2000000000a9a5b0 in instanceKlass::jni_id_for ()
#10 0x2000000000d582d0 in methodOopDesc::jni_id ()
#11 0x2000000000b1c670 in get_method_id ()
#12 0x2000000000b1c940 in jni_GetMethodID ()
--------
When the program is looking for JNIid within instanceKlass::_jni_ids
in JNIid::find() function, SIGSEGV occurs.
The address of current becomes incorrect at the line (a) .
Then SIGSEGV occurs on the access to JNIid::_offset.
-----
JNIid* JNIid::find(int offset) {
JNIid* current = this;
while (current != NULL) {
if (current->offset() == offset) return current; ------- (a)
current = current->next();
}
return NULL;
}
------
JNIid is created in instanceKlass::jni_id_for_impl().
----
....
JNIid* instanceKlass::jni_id_for_impl(instanceKlassHandle this_oop, int offset) {
MutexLocker ml(JNIIdentifier_lock);
// Retry lookup after we got the lock
JNIid* probe = this_oop->jni_ids() == NULL ? NULL : this_oop->jni_ids()->find(offset);
if (probe == NULL) {
// Slow case, allocate new static field identifier
probe = new JNIid(this_oop->as_klassOop(), offset, this_oop->jni_ids()); ------ (b)
this_oop->set_jni_ids(probe); ------ (c)
}
return probe;
}
------
This function constructs a new JNIid at the line (b) and set the pointer to
instanceKlass::_jni_ids at the line of (c).
Here, there is a possibility that other threads refer to incorrect values
(ex. JNIid::_next, JNIid::offset) of JNIid. (more details will be provided later.)
On the other hand, in instanceKlass::jni_id_for() as a trigger of access violation,
If jni_ids() is equals to NULL, jni_id_for_impl() is called.
If jni_ids() is not equals to NULL, JNIid::find() is called at the following line (d).
----
JNIid* instanceKlass::jni_id_for(int offset) {
JNIid* probe = jni_ids() == NULL ? NULL : jni_ids()->find(offset); ------- (d)
if (probe == NULL) {
probe = jni_id_for_impl(this->as_klassOop(), offset);
}
return probe;
}
----
When several threads call jni_id_for(),
the following race condition seems to happen.
Thread-A Thread-B
---------------------------------------- ------------------------------
(d) jni_ids() == NULL (1)
(b) probe = new JNIid() (2)
(c) this_oop->set_jni_ids(probe) (3)
(d) jni_ids() != NULL (4)
(a) Access to JNIid::_offset (5)
When thread-B reaches to (d), if thread-B can see the address (3) which is set
by thread-A at (c), thread-B accesses to the content(value) of JNIid
by using JNIid::find().
However, there does not seem guaranteed that thread-B can see(use) the content(value) (2)
of JNIid which thread-A got at (b), because there is no memory barrier between
(b) and (c).
In other words, in the check of "jni_ids() == NULL" at (d),
there is a possibility that the program accesses to JNIid while the correct(expected)
JNIid can not be seen.
SIGSEGV seems to occur in the above mentioned scenario.
NOTE :
The following is where this reported SIGSEGV occurred.
When SIGSEGV occurs in ld instruction at 0x2000000000a9a2e1,
some incorrect value is set to the variable,"current".
0x2000000000a9a2b0 <_ZN5JNIid4findEi>: [MII] nop.m 0x0
0x2000000000a9a2b1 <_ZN5JNIid4findEi+1>: mov r8=r32;;
0x2000000000a9a2b2 <_ZN5JNIid4findEi+2>: nop.i 0x0
0x2000000000a9a2c0 <_ZN5JNIid4findEi+16>: [MFB] cmp.eq p6,p7=0,r8
0x2000000000a9a2c1 <_ZN5JNIid4findEi+17>: nop.f 0x0
0x2000000000a9a2c2 <_ZN5JNIid4findEi+18>: (p06) br.cond.dpnt.few 0x2000000000a9a320<_ZN5JNIid4findEi+112>
0x2000000000a9a2d0 <_ZN5JNIid4findEi+32>: [MFI] adds r2=16,r8
0x2000000000a9a2d1 <_ZN5JNIid4findEi+33>: nop.f 0x0
0x2000000000a9a2d2 <_ZN5JNIid4findEi+34>: adds r15=8,r8;;
0x2000000000a9a2e0 <_ZN5JNIid4findEi+48>: [MMB] nop.m 0x0
0x2000000000a9a2e1 <_ZN5JNIid4findEi+49>: ld4 r14=[r2] <==== offset
0x2000000000a9a2e2 <_ZN5JNIid4findEi+50>: nop.b 0x0;;
0x2000000000a9a2f0 <_ZN5JNIid4findEi+64>: [MIB] nop.m 0x0
0x2000000000a9a2f1 <_ZN5JNIid4findEi+65>: cmp4.eq p9,p8=r33,r14
0x2000000000a9a2f2 <_ZN5JNIid4findEi+66>: (p09) br.ret.dpnt.many b0
0x2000000000a9a300 <_ZN5JNIid4findEi+80>: [MMB] nop.m 0x0
0x2000000000a9a301 <_ZN5JNIid4findEi+81>: ld8 r8=[r15]
0x2000000000a9a302 <_ZN5JNIid4findEi+82>: nop.b 0x0;;
0x2000000000a9a310 <_ZN5JNIid4findEi+96>: [MIB] nop.m 0x0
0x2000000000a9a311 <_ZN5JNIid4findEi+97>: cmp.eq p6,p7=0,r8
0x2000000000a9a312 <_ZN5JNIid4findEi+98>: (p07) br.cond.dptk.few 0x2000000000a9a2d0<_ZN5JNIid4findEi+32>
0x2000000000a9a320 <_ZN5JNIid4findEi+112>: [MIB] nop.m 0x0
0x2000000000a9a321 <_ZN5JNIid4findEi+113>: mov r8=r0
0x2000000000a9a322 <_ZN5JNIid4findEi+114>: br.ret.sptk.many b0;;
A SIGSEGV occurs when GetMethodID() is called.
CONFIGURATION:
OS : RHEL 4 (Itanium)
JDK: 1.4.2_11
INVESTIGATION :
The followings are the information of signal and stack trace.
-------
Unexpected Signal : SIGSEGV [0xb] occurred at PC=0x2000000000a9a2e1, pid=28651, nid=2305843014325498496
siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x005f00320036006f
#0 0xa000000000010640 in __kernel_syscall_via_break ()
#1 0x20000000000f7630 in raise ()
#2 0x20000000000fa010 in abort ()
#3 0x2000000000da9bb0 in os::abort ()
#4 0x2000000000f7f5d0 in VMError::report_and_die ()
#5 0x2000000000db3cc0 in JVM_handle_linux_signal ()
#6 0x2000000000dadc20 in signalHandler ()
#7 <signal handler called>
#8 0x2000000000a9a2e0 in JNIid::find ()
#9 0x2000000000a9a5b0 in instanceKlass::jni_id_for ()
#10 0x2000000000d582d0 in methodOopDesc::jni_id ()
#11 0x2000000000b1c670 in get_method_id ()
#12 0x2000000000b1c940 in jni_GetMethodID ()
--------
When the program is looking for JNIid within instanceKlass::_jni_ids
in JNIid::find() function, SIGSEGV occurs.
The address of current becomes incorrect at the line (a) .
Then SIGSEGV occurs on the access to JNIid::_offset.
-----
JNIid* JNIid::find(int offset) {
JNIid* current = this;
while (current != NULL) {
if (current->offset() == offset) return current; ------- (a)
current = current->next();
}
return NULL;
}
------
JNIid is created in instanceKlass::jni_id_for_impl().
----
....
JNIid* instanceKlass::jni_id_for_impl(instanceKlassHandle this_oop, int offset) {
MutexLocker ml(JNIIdentifier_lock);
// Retry lookup after we got the lock
JNIid* probe = this_oop->jni_ids() == NULL ? NULL : this_oop->jni_ids()->find(offset);
if (probe == NULL) {
// Slow case, allocate new static field identifier
probe = new JNIid(this_oop->as_klassOop(), offset, this_oop->jni_ids()); ------ (b)
this_oop->set_jni_ids(probe); ------ (c)
}
return probe;
}
------
This function constructs a new JNIid at the line (b) and set the pointer to
instanceKlass::_jni_ids at the line of (c).
Here, there is a possibility that other threads refer to incorrect values
(ex. JNIid::_next, JNIid::offset) of JNIid. (more details will be provided later.)
On the other hand, in instanceKlass::jni_id_for() as a trigger of access violation,
If jni_ids() is equals to NULL, jni_id_for_impl() is called.
If jni_ids() is not equals to NULL, JNIid::find() is called at the following line (d).
----
JNIid* instanceKlass::jni_id_for(int offset) {
JNIid* probe = jni_ids() == NULL ? NULL : jni_ids()->find(offset); ------- (d)
if (probe == NULL) {
probe = jni_id_for_impl(this->as_klassOop(), offset);
}
return probe;
}
----
When several threads call jni_id_for(),
the following race condition seems to happen.
Thread-A Thread-B
---------------------------------------- ------------------------------
(d) jni_ids() == NULL (1)
(b) probe = new JNIid() (2)
(c) this_oop->set_jni_ids(probe) (3)
(d) jni_ids() != NULL (4)
(a) Access to JNIid::_offset (5)
When thread-B reaches to (d), if thread-B can see the address (3) which is set
by thread-A at (c), thread-B accesses to the content(value) of JNIid
by using JNIid::find().
However, there does not seem guaranteed that thread-B can see(use) the content(value) (2)
of JNIid which thread-A got at (b), because there is no memory barrier between
(b) and (c).
In other words, in the check of "jni_ids() == NULL" at (d),
there is a possibility that the program accesses to JNIid while the correct(expected)
JNIid can not be seen.
SIGSEGV seems to occur in the above mentioned scenario.
NOTE :
The following is where this reported SIGSEGV occurred.
When SIGSEGV occurs in ld instruction at 0x2000000000a9a2e1,
some incorrect value is set to the variable,"current".
0x2000000000a9a2b0 <_ZN5JNIid4findEi>: [MII] nop.m 0x0
0x2000000000a9a2b1 <_ZN5JNIid4findEi+1>: mov r8=r32;;
0x2000000000a9a2b2 <_ZN5JNIid4findEi+2>: nop.i 0x0
0x2000000000a9a2c0 <_ZN5JNIid4findEi+16>: [MFB] cmp.eq p6,p7=0,r8
0x2000000000a9a2c1 <_ZN5JNIid4findEi+17>: nop.f 0x0
0x2000000000a9a2c2 <_ZN5JNIid4findEi+18>: (p06) br.cond.dpnt.few 0x2000000000a9a320<_ZN5JNIid4findEi+112>
0x2000000000a9a2d0 <_ZN5JNIid4findEi+32>: [MFI] adds r2=16,r8
0x2000000000a9a2d1 <_ZN5JNIid4findEi+33>: nop.f 0x0
0x2000000000a9a2d2 <_ZN5JNIid4findEi+34>: adds r15=8,r8;;
0x2000000000a9a2e0 <_ZN5JNIid4findEi+48>: [MMB] nop.m 0x0
0x2000000000a9a2e1 <_ZN5JNIid4findEi+49>: ld4 r14=[r2] <==== offset
0x2000000000a9a2e2 <_ZN5JNIid4findEi+50>: nop.b 0x0;;
0x2000000000a9a2f0 <_ZN5JNIid4findEi+64>: [MIB] nop.m 0x0
0x2000000000a9a2f1 <_ZN5JNIid4findEi+65>: cmp4.eq p9,p8=r33,r14
0x2000000000a9a2f2 <_ZN5JNIid4findEi+66>: (p09) br.ret.dpnt.many b0
0x2000000000a9a300 <_ZN5JNIid4findEi+80>: [MMB] nop.m 0x0
0x2000000000a9a301 <_ZN5JNIid4findEi+81>: ld8 r8=[r15]
0x2000000000a9a302 <_ZN5JNIid4findEi+82>: nop.b 0x0;;
0x2000000000a9a310 <_ZN5JNIid4findEi+96>: [MIB] nop.m 0x0
0x2000000000a9a311 <_ZN5JNIid4findEi+97>: cmp.eq p6,p7=0,r8
0x2000000000a9a312 <_ZN5JNIid4findEi+98>: (p07) br.cond.dptk.few 0x2000000000a9a2d0<_ZN5JNIid4findEi+32>
0x2000000000a9a320 <_ZN5JNIid4findEi+112>: [MIB] nop.m 0x0
0x2000000000a9a321 <_ZN5JNIid4findEi+113>: mov r8=r0
0x2000000000a9a322 <_ZN5JNIid4findEi+114>: br.ret.sptk.many b0;;