Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4328233

b09 Assert: "result->blob_contains((address)start)" in CodeCache::find_blob

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P3 P3
    • None
    • 1.3.0
    • hotspot
    • None
    • sparc
    • generic

      The bug is difficult to reproduce, but has been seen, for instance
      running Volano with a very high connection count on a large MP
      machine: In the specific case of this report, we were running
      with 1100 connections (only) on a 10 processor SPARC (the assert
      fired in the client; we were running with Compiler2, but i believe
      that may just be a coincoidence):

      The VM thread is in the process of progress towards a safepoint,
      and asserts trying to examine the state of a (suspended) thread
      that's executing compiled code.

      Here's the VM thread:

      current thread: t@4
        [1] __sigprocmask(0x0, 0xfee0abf0, 0x0, 0xffffffff, 0xffffffff, 0x0), at 0xff379da0
        [2] _resetsig(0xff38ca04, 0x0, 0xfee0bdc8, 0x0, 0x0, 0xfee0be44), at 0xff36f3f8
        [3] _sigon(0xff394358, 0xff3942d8, 0xfee0be3c, 0xfee0acc4, 0x6, 0xff2cd00c), at 0xff36eb44
        [4] _thrp_kill(0x0, 0x4, 0x6, 0xff38ca04, 0xfee0bdc8, 0xff393c5c), at 0xff371a20
        [5] abort(0xff333968, 0xfee0adc0, 0x0, 0xfffffff8, 0x0, 0xfee0ade1), at 0xff2b9468
        [6] os::abort(), line 1150 in "os_solaris.cpp"
        [7] report_error(is_vm_internal_error = 1, file_name = 0xfe5190c4 "/export/sail1/jdk1.3_b09/hotsparc-1.3-b09/src/share/vm/code/codeCache.cpp", line_no = 193, title = 0xfe421c6c "assertion failure", format = 0xfe421c80 "assert(%s, "%s")", ...), line 322 in "debug.cpp"
        [8] report_assertion_failure(code_str = 0xfe51909c "result->blob_contains((address)start)", file_name = 0xfe5190c4 "/export/sail1/jdk1.3_b09/hotsparc-1.3-b09/src/share/vm/code/codeCache.cpp", line_no = 193, message = 0xfe519110 "found wrong CodeBlob"), line 145 in "debug.cpp"
        [9] CodeCache::find_blob(start = 0xfa003300), line 193 in "codeCache.cpp"
      =>[10] ThreadSafepointState::examine_state_of_thread(this = 0x94c1f8, iterations = 0), line 502 in "safepoint.cpp"
        [11] SafepointSynchronize::begin(), line 75 in "safepoint.cpp"
        [12] VMThread::loop(this = 0x155ca0), line 269 in "vmThread.cpp"
        [13] VMThread::vm_thread_main(notify_lock = 0x28f78), line 162 in "vmThread.cpp"
        [14] _start(osthread = 0x155f38), line 514 in "os_solaris.cpp"


      And here's the target thread (the one whose thread state is being "examined"):

      current thread: t@2189
        [1] __lwp_sema_wait(0xc2cd1e78, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff317aec
        [2] _park(0xc2cd1dc8, 0xc2cd1e78, 0x0, 0xf, 0xff38d798, 0xc4311dc8), at 0xff36b1e4
        [3] _swtch(0x5, 0xff38ca04, 0xc2cd1e58, 0xc2cd1e54, 0xc2cd1e50, 0xc2cd1e4c), at 0xff36aed8
        [4] _dopreempt(0x0, 0xff38ca04, 0x1, 0x0, 0x0, 0x0), at 0xff36ce10
        [5] _sigon(0x1, 0xc2cd1e2c, 0xc2cd0f88, 0x0, 0x0, 0x0), at 0xff36eaf0
        [6] thr_sigsetmask(0xc2cd1e54, 0xc2cd1030, 0x8000, 0xff38ca04, 0xc2cd1e4c, 0xc2cd1e2c), at 0xff371244
        [7] __libthread_segvhdlr(0xb, 0xc2cd14a0, 0xc2cd11e8, 0xff38ca04, 0x0, 0x0), at 0xff379344
        [8] __sighndlr(0xb, 0xc2cd14a0, 0xc2cd11e8, 0xff379278, 0xc2cd1e4c, 0xc2cd1e2c), at 0xff37be00
        [9] sigacthandler(0xb, 0xc2cd14a0, 0xc2cd11e8, 0xff38ca04, 0x2c, 0xc2cd1e5c), at 0xff378708
        ---- called from signal handler with signal 11 (SIGSEGV) ------
        [10] 0xfa0032fc(0xda0310e0, 0xe, 0x94b750, 0xf7ce5830, 0xc, 0x8), at 0xfa0032fb
        [11] 0xfa0156dc(0xda030f48, 0xf7cd10b8, 0xd8532120, 0xda030fe0, 0x0, 0x0), at 0xfa0156db
        [12] 0xfa016794(0xf7c38d00, 0x1, 0x0, 0x0, 0x0, 0x0), at 0xfa016793
        [13] 0xfa014a3c(0xd7c6cbe0, 0xf7c16d58, 0xc2cd17dc, 0x4, 0xbfc00, 0x0), at 0xfa014a3b
        [14] 0xc93c8(0xc2cd17dc, 0x94b750, 0x8, 0xe3da8, 0x4, 0xc2cd1708), at 0xc93c7
        [15] 0xfe5d8974(0xc2cd1874, 0xc2cd1bc0, 0xa, 0xf7c176d8, 0x4, 0xc2cd1780), at 0xfe5d8973
        [16] JavaCalls::call_helper(result = 0xc2cd1bb8, m = 0xc2cd1a50, args = 0xc2cd1b04, __the_thread__ = 0x94b750), line 347 in "javaCalls.cpp"
        [17] os::os_exception_wrapper(f = 0xfdbd0c98 = &JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), value = 0xc2cd1bb8, method = 0xc2cd1a50, args = 0xc2cd1b04, thread = 0x94b750), line 1854 in "os_solaris.cpp"
        [18] JavaCalls::call(result = 0xc2cd1bb8, method = CLASS, args = 0xc2cd1b04, __the_thread__ = 0x94b750), line 284 in "javaCalls.cpp"
        [19] JavaCalls::call_virtual(result = 0xc2cd1bb8, spec_klass = CLASS, name = CLASS, signature = CLASS, args = 0xc2cd1b04, __the_thread__ = 0x94b750), line 180 in "javaCalls.cpp"
        [20] JavaCalls::call_virtual(result = 0xc2cd1bb8, receiver = CLASS, spec_klass = CLASS, name = CLASS, signature = CLASS, __the_thread__ = 0x94b750), line 186 in "javaCalls.cpp"
        [21] thread_entry(thread = 0x94b750, __the_thread__ = 0x94b750), line 1678 in "jvm.cpp"
        [22] JavaThread::thread_main_inner(thread = 0x94b750), line 790 in "thread.cpp"
        [23] JavaThread::thread_main(thread = 0x94b750), line 774 in "thread.cpp"
        [24] _start(osthread = 0x94c290), line 514 in "os_solaris.cpp"


      Here's what the "this" pointer in frame 10 of t@4 looks like:

      *this = {
          _type = _running
          _thread = 0x94b750
          _addr = {
              _pc = 0xfa0032fc "\xc0"
              _npc = 0xfa003300 ""
          }
          _handle = 0x94c250
          _code_buffer = (nil)
          _stop_state = _thread_in_Java
          _stop_pc = 0xfa003300 ""
      }

      Note that _pc and _npc indicate the right values for frame 10 of t@2189:

      current thread: t@2189
      current frame: [10]
      g0-g3 0x00000000 0x000397ac 0x0094b750 0x00000134
      g4-g7 0xc2cd1518 0x00000000 0x00000000 0xc2cd1dc8
      o0-o3 0xf7cca770 0x0000000e 0x00000000 0x0000000e
      o4-o7 0x0094b750 0x0000000e 0xc2cd1520 0xfa006a3c
      l0-l3 0xf7c004d0 0xd8532380 0xf7c98120 0x00000001
      l4-l7 0xf7c98120 0xd85323a8 0xd85323a8 0xfffffff8
      i0-i3 0xda0310e0 0x0000000e 0x0094b750 0xf7ce5830
      i4-i7 0x0000000c 0x00000008 0xc2cd15c0 0xfa0156dc
      y 0x00000000
      ccr 0x00000004
      pc 0xfa0032fc:0xfa0032fc ld [0x0], %g0
      npc 0xfa003300:0xfa003300 illtrap 0x2

      And here's what the code for frame 10 of t@2189 looks like:

      0xfa003290: save %sp, -0x60, %sp
      0xfa003294: st %sp, [%g2 + 0x8c]
      0xfa003298: st %g0, [%g2 + 0x118]
      0xfa00329c: mov %g2, %o0
      0xfa0032a0: mov %g2, %l7
      0xfa0032a4: call handle_illegal_instruction_exception
      0xfa0032a8: nop
      0xfa0032ac: mov %l7, %g2
      0xfa0032b0: st %l2, [%sp + 0x8]
      0xfa0032b4: save %sp, -0x60, %sp
      0xfa0032b8: ld [%fp + 0x8], %l2
      0xfa0032bc: ld [%g2 + 0x8c], %l0
      0xfa0032c0: tst %l0
      0xfa0032c4: te %icc,%g0 + 16
      0xfa0032c8: restore
      0xfa0032cc: st %g0, [%g2 + 0x8c]
      0xfa0032d0: st %g0, [%g2 + 0x118]
      0xfa0032d4: ld [%g2 + 0x4], %g5
      0xfa0032d8: tst %g5
      0xfa0032dc: be,pt %icc,0xfa0032f4
      0xfa0032e0: nop
      0xfa0032e4: restore
      0xfa0032e8: sethi %hi(0xfe5d8400), %o7
      0xfa0032ec: add %o7, 0xc0, %o7
      0xfa0032f0: ld [0x0], %g0
      0xfa0032f4: restore
      0xfa0032f8: add %o7, 0x8, %o7
      0xfa0032fc: ld [0x0], %g0
      0xfa003300: illtrap 0x2
      0xfa003304: illtrap 0x1
      0xfa003308: illtrap 0x0
      0xfa00330c: illtrap 0x0
      0xfa003310: illtrap 0x4c
      0xfa003314: illtrap 0x34
      0xfa003318: illtrap 0x4
      0xfa00331c: illtrap 0x40
      0xfa003320: illtrap 0x4c
      0xfa003324: illtrap 0x4c

      Note that it appears that _npc is pointing _beyond_ a CodeBlob at the
      header of the next blob.

      Methinks that we should use _pc if _pc and _npc are contiguous to
      avoid such problems. Indeed, if _pc and _npc are not contiguous
      we should continue to use _npc to avoid from letting the target
      thread escape into a method that we haven't patched. [Or even better
      we could let the thread escape, and look at it again later, and
      patch it -- taking care to avoid the measure zero case of no progress
      where we always catch the target in a delayed control transfer).

      As evidence of the fact that _npc is pointing just beyond a valid
      CodeBlob, we find that:

      in frame 9 or t@4, we have:

      result = 0xfa003310

      and *result = {
          _size = 76
          _header_size = 52
          _relocation_size = 4
          _instructions_offset = 64
          _data_offset = 76
          _oops_offset = 76
          _oops_length = 0
          _frame_size = 16
          _return_address_offset = 15
          _link_offset = -2
          _oop_maps = 0x155008
      }

      This corresponds to _heap.find_start(_npc). Whereas _heap.find_start(_pc)
      yields: `libjvm_g.so`codeCache.cpp`_heap.find_start((void *) 4194317052U) = 0xfa003250

      where *(class CodeBlob *) 4194316880U = {
          _size = 176
          _header_size = 48
          _relocation_size = 8
          _instructions_offset = 64
          _data_offset = 176
          _oops_offset = 176
          _oops_length = 0
          _frame_size = 24
          _return_address_offset = 15
          _link_offset = -2
          _oop_maps = (nil)
      }

      Not surprisingly: the end of the code blob = start of code blob for _pc + 176 =
      _npc i.e. the _npc points just beyond the end of the actual CodeBlob
      that we want!

      I will need the active tutoring and assistance of a Compiler Safepoints
      guru to try and fix (or even understand) this one. Clearly, the naive
      solution of using the _pc will not work, because that can cause
      looping of a compiled thread at a safepoint because we can just
      end up patching caller rather than callee (and if the callee has an
      infinite loop, we'll keep looping till the cows come home, and beyond).

      y.s.ramakrishna@eng 2000-04-05

      ==============================================

            ysr Y. Ramakrishna
            ysr Y. Ramakrishna
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: