Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4681417

running mtrt exposes race condition in SharedRuntime::reresolve_call_site

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P3
    • 1.4.2
    • 1.4.1
    • hotspot
    • None
    • mantis
    • x86
    • solaris_8

    Description

      I made about 73 SpecApplication runs, over about 50 CPU hours.
      The runs were 3 at a time on Krajy, a 500 MHz 4-way machine.
      Two runs ended with the "not a call imm32" assert.

      (NOTE: There were also 8 assertion failures with the
      message "GC active during No_GC_Verifier"; this is an
      unrelated bug!)

      Here is a partial transcript analyzing one of the two failures.

      --------
      date
      Tue May 7 04:54:10 PDT 2002
      --------
      ./gamma_gfast -XX:+ShowMessageBoxOnError -Xmx32m SpecApplication -a -s100 -m5 -M5 _227_mtrt _202_jess _201_compress _209_db _222_mpegaudio _228_jack _213_javac
      # Executing gamma in fastdebug directory...
      VM option '+ShowMessageBoxOnError'
      VM option '+ShowMessageBoxOnError'
      Speed will be 100
      Will run each benchmark at least 5 times
      Will run each benchmark at most 5 times
      Caching Off Speed = 100 Auto run mode

      ======= _227_mtrt Starting =======
      Run 0 start. Total memory=2031616 free memory=1841448
      +100 to 199 by 200
      +0 to 99 by 200
      Addr: 0xda0e2482 Code: 0xeb >>>> JMP not CALL
      #
      # HotSpot Virtual Machine Error, Internal Error
      # Please report this error at
      # http://java.sun.com/cgi-bin/bugreport.cgi
      #
      # Java VM: Java HotSpot(TM) Server VM (1.4.1-internal-debug mixed mode)
      #
      # Fatal: not a call imm32
      #
      # Error ID: /net/lukasiewicz/export/lukasiewicz1/jrose/hopper/src/cpu/i486/vm/nativeInst_i486.cpp, 21
      #
      # Problematic Thread: prio=5 tid=0x81c1620 nid=0xe runnable
      #
      Internal Error
      Fatal: not a call imm32

      Do you want to debug the problem?
      # SafepointSynchronize::begin: Fatal error:
      # SafepointSynchronize::begin: Timed out while attempting to reach a safepoint.
      # SafepointSynchronize::begin: Threads which did not reach the safepoint:
      # Thread: 0x81c1620 [0x e] State: _call_back pc: 0x0
      nid=0xe runnable
      # SafepointSynchronize::begin: (End of list)
      ...
      --------
      dbx - 7169
      ... >>>> See FAIL4.threads for a full set of backtraces.
      (dbx) w 4
      current thread: t@14
      =>[6] report_fatal(file_name = ???, line_no = ???, format = ???, ...) (optimized), at 0xdebcea99 (line ~139) in "jrose/hopper/src/share/vm/utilities/debug.cpp"
        [7] NativeCall::verify(this = ???) (optimized), at 0xdf054270 (line ~21) in "jrose/hopper/src/cpu/i486/vm/nativeInst_i486.cpp"
        [8] SharedRuntime::reresolve_call_site(caller = CLASS) (optimized), at 0xdf170715 (line ~622) in "jrose/hopper/src/share/vm/runtime/sharedRuntime.cpp"
        [9] OptoRuntime::handle_wrong_method(thread = ???) (optimized), at 0xdf14af91 (line ~489) in "jrose/hopper/src/share/vm/opto/runtime.cpp"

      >>>> handle_wrong_method is called for both missed monomorphic V-calls
           and also for zombie methods. Since the call site in question is
           calling the VEP of 0xda0d558c, the trip through handle_wrong_method
           was probably handling a zombie method, the previous version of
           0xda0d558c.
      <<<<

      (dbx) p -fx *(unsigned char*)0xda0e2482
      *((unsigned char *) 0xda0e2482U) = 0xe8
      (dbx) x 0xda0e2482/i
      0xda0e2482: call 0xda0d558c <0xda0d558c>
      >>>> Instruction is a call, again => A RACE CONDITION.
      >>>> Note that the instruction displacement lies across a word boundary.

      (dbx) x 0xda0d558c-12/60i
      >>>> Examine callee Java method.
      >>>UEP:
      0xda0d5580: cmpl 4(%ecx),%eax
      0xda0d5583: jne 0xda0a97d0 <0xda0a97d0> >>>WRONG METHOD
      0xda0d5589: nop
      0xda0d558a: nop
      0xda0d558b: nop
      >>>VEP:
      0xda0d558c: movl %eax,0xffffc000(%esp,1) >>>BANG STACK
      0xda0d5593: subl $0xc,%esp >>> MAKE FRAME
      0xda0d5599: flds 8(%edx)
      0xda0d559c: fstp %st(2)
      0xda0d559e: flds 16(%edx)
      0xda0d55a1: fstp %st(1)
      0xda0d55a3: flds 8(%ecx)
      0xda0d55a6: fstp %st(6)
      0xda0d55a8: flds 16(%esp,1)
      0xda0d55ac: fmul %st(1),%st
      0xda0d55ae: fstps 0(%esp,1)
      ... (rest of code is in FAIL4.callee)

      >>>> Examine caller Java method.
      (dbx) x 0xda0e1e00/772i
      ... (rest of code is in FAIL4.caller)
      0xda0e246c: flds 24(%ebx)
      0xda0e246f: fmul %st(2),%st
      0xda0e2471: fstps 104(%esp,1)
      0xda0e2475: movl %ebp,%ecx
      0xda0e2477: flds 104(%esp,1)
      0xda0e247b: fstp %st(2)
      0xda0e247d: fld %st(1)
      0xda0e247f: fstps (%esp,1)
      0xda0e2482: call 0xda0d558c <0xda0d558c> >>>> CALL VEP OF CALLEE
      0xda0e2487: movl 8(%edi),%ecx
      0xda0e248a: movl 8(%ecx),%ebp
      ...

      <<<<

      The only places where 0xEB is patched in over a call are:
       NativeCall::replace_mt_safe
        (apparently unused)
       NativeCall::set_destination_mt_safe
        used only in compiledIC.cpp, several times
        1. CompiledIC::set_ic_destination (virtual calls only)
        2. All other occurrences are in CompiledStaticCall methods.

      Since the callee method is virtual, set_ic_destination is
      the likely culprit. These are the calls to set_ic_destination:
        1. Two in icBuffer.cpp, for transition stubs.
        2. CompiledIC::set_to_monomorphic is used on first call.
        3. CompiledIC::set_to_clean is used to reinitialize.
      Since the call site is non-virtual, only 2 or 3 are likely.

      The most likely scenario is a method becoming a zombie,
      which is then called simultaneously by two processors
      from the same call site, leading to a race between
      two threads running reresolve_call_site.

      Attachments

        Issue Links

          Activity

            People

              coleenp Coleen Phillimore
              jrose John Rose
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: