Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-7017831

Crash occurs in JNIid::find becasue of no memory barrier in 1.4.2_11(Itanium)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: P2 P2
    • None
    • 1.4.2_11
    • hotspot
    • itanium
    • linux_redhat_4.0

      CU reported a VM crash problem.

      A SIGSEGV occurs when GetMethodID() is called.

      CONFIGURATION:
        OS : RHEL 4 (Itanium)
        JDK: 1.4.2_11

      INVESTIGATION :

      The followings are the information of signal and stack trace.

      -------
        Unexpected Signal : SIGSEGV [0xb] occurred at PC=0x2000000000a9a2e1, pid=28651, nid=2305843014325498496
       
        siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x005f00320036006f
       
        #0 0xa000000000010640 in __kernel_syscall_via_break ()
        #1 0x20000000000f7630 in raise ()
        #2 0x20000000000fa010 in abort ()
        #3 0x2000000000da9bb0 in os::abort ()
        #4 0x2000000000f7f5d0 in VMError::report_and_die ()
        #5 0x2000000000db3cc0 in JVM_handle_linux_signal ()
        #6 0x2000000000dadc20 in signalHandler ()
        #7 <signal handler called>
        #8 0x2000000000a9a2e0 in JNIid::find ()
        #9 0x2000000000a9a5b0 in instanceKlass::jni_id_for ()
        #10 0x2000000000d582d0 in methodOopDesc::jni_id ()
        #11 0x2000000000b1c670 in get_method_id ()
        #12 0x2000000000b1c940 in jni_GetMethodID ()
      --------

      When the program is looking for JNIid within instanceKlass::_jni_ids
      in JNIid::find() function, SIGSEGV occurs.

      The address of current becomes incorrect at the line (a) .
      Then SIGSEGV occurs on the access to JNIid::_offset.

      -----

        JNIid* JNIid::find(int offset) {
          JNIid* current = this;
          while (current != NULL) {
            if (current->offset() == offset) return current; ------- (a)
            current = current->next();
          }
          return NULL;
        }
      ------


      JNIid is created in instanceKlass::jni_id_for_impl().

      ----
      ....
        JNIid* instanceKlass::jni_id_for_impl(instanceKlassHandle this_oop, int offset) {
          MutexLocker ml(JNIIdentifier_lock);
          // Retry lookup after we got the lock
          JNIid* probe = this_oop->jni_ids() == NULL ? NULL : this_oop->jni_ids()->find(offset);
          if (probe == NULL) {
            // Slow case, allocate new static field identifier
            probe = new JNIid(this_oop->as_klassOop(), offset, this_oop->jni_ids()); ------ (b)
            this_oop->set_jni_ids(probe); ------ (c)
          }
          return probe;
        }
      ------

      This function constructs a new JNIid at the line (b) and set the pointer to
      instanceKlass::_jni_ids at the line of (c).

      Here, there is a possibility that other threads refer to incorrect values
      (ex. JNIid::_next, JNIid::offset) of JNIid. (more details will be provided later.)

      On the other hand, in instanceKlass::jni_id_for() as a trigger of access violation,
      If jni_ids() is equals to NULL, jni_id_for_impl() is called.
      If jni_ids() is not equals to NULL, JNIid::find() is called at the following line (d).

      ----
        JNIid* instanceKlass::jni_id_for(int offset) {
          JNIid* probe = jni_ids() == NULL ? NULL : jni_ids()->find(offset); ------- (d)
          if (probe == NULL) {
            probe = jni_id_for_impl(this->as_klassOop(), offset);
          }
          return probe;
        }
      ----


      When several threads call jni_id_for(),
      the following race condition seems to happen.

             Thread-A Thread-B
      ---------------------------------------- ------------------------------
        (d) jni_ids() == NULL (1)
        (b) probe = new JNIid() (2)
        (c) this_oop->set_jni_ids(probe) (3)
                                                 (d) jni_ids() != NULL (4)
                                                 (a) Access to JNIid::_offset (5)


      When thread-B reaches to (d), if thread-B can see the address (3) which is set
      by thread-A at (c), thread-B accesses to the content(value) of JNIid
      by using JNIid::find().

      However, there does not seem guaranteed that thread-B can see(use) the content(value) (2)
      of JNIid which thread-A got at (b), because there is no memory barrier between
      (b) and (c).

      In other words, in the check of "jni_ids() == NULL" at (d),
      there is a possibility that the program accesses to JNIid while the correct(expected)
      JNIid can not be seen.

      SIGSEGV seems to occur in the above mentioned scenario.



      NOTE :
      The following is where this reported SIGSEGV occurred.
      When SIGSEGV occurs in ld instruction at 0x2000000000a9a2e1,
      some incorrect value is set to the variable,"current".

        0x2000000000a9a2b0 <_ZN5JNIid4findEi>: [MII] nop.m 0x0
        0x2000000000a9a2b1 <_ZN5JNIid4findEi+1>: mov r8=r32;;
        0x2000000000a9a2b2 <_ZN5JNIid4findEi+2>: nop.i 0x0
        0x2000000000a9a2c0 <_ZN5JNIid4findEi+16>: [MFB] cmp.eq p6,p7=0,r8
        0x2000000000a9a2c1 <_ZN5JNIid4findEi+17>: nop.f 0x0
        0x2000000000a9a2c2 <_ZN5JNIid4findEi+18>: (p06) br.cond.dpnt.few 0x2000000000a9a320<_ZN5JNIid4findEi+112>
        0x2000000000a9a2d0 <_ZN5JNIid4findEi+32>: [MFI] adds r2=16,r8
        0x2000000000a9a2d1 <_ZN5JNIid4findEi+33>: nop.f 0x0
        0x2000000000a9a2d2 <_ZN5JNIid4findEi+34>: adds r15=8,r8;;
        0x2000000000a9a2e0 <_ZN5JNIid4findEi+48>: [MMB] nop.m 0x0
        0x2000000000a9a2e1 <_ZN5JNIid4findEi+49>: ld4 r14=[r2] <==== offset
        0x2000000000a9a2e2 <_ZN5JNIid4findEi+50>: nop.b 0x0;;
        0x2000000000a9a2f0 <_ZN5JNIid4findEi+64>: [MIB] nop.m 0x0
        0x2000000000a9a2f1 <_ZN5JNIid4findEi+65>: cmp4.eq p9,p8=r33,r14
        0x2000000000a9a2f2 <_ZN5JNIid4findEi+66>: (p09) br.ret.dpnt.many b0
        0x2000000000a9a300 <_ZN5JNIid4findEi+80>: [MMB] nop.m 0x0
        0x2000000000a9a301 <_ZN5JNIid4findEi+81>: ld8 r8=[r15]
        0x2000000000a9a302 <_ZN5JNIid4findEi+82>: nop.b 0x0;;
        0x2000000000a9a310 <_ZN5JNIid4findEi+96>: [MIB] nop.m 0x0
        0x2000000000a9a311 <_ZN5JNIid4findEi+97>: cmp.eq p6,p7=0,r8
        0x2000000000a9a312 <_ZN5JNIid4findEi+98>: (p07) br.cond.dptk.few 0x2000000000a9a2d0<_ZN5JNIid4findEi+32>
        0x2000000000a9a320 <_ZN5JNIid4findEi+112>: [MIB] nop.m 0x0
        0x2000000000a9a321 <_ZN5JNIid4findEi+113>: mov r8=r0
        0x2000000000a9a322 <_ZN5JNIid4findEi+114>: br.ret.sptk.many b0;;

            kevinw Kevin Walls
            tbaba Tadayuki Baba (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: