Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8168914

Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking

    XMLWordPrintable

Details

    • gc
    • b161
    • generic

    Backports

      Description

        Rafael Winterhalter reported a crash in JNIHandleBlock::oops_do() during concurrent marking which only happened if massive class redefinition was performed in parallel. He supplied two hs_err files which had similar stack traces (see attached hs_err files for full details):

        # SIGSEGV (0xb) at pc=0x00007f80940887eb, pid=3809, tid=0x00007f8090870700
        #
        # JRE version: Java(TM) SE Runtime Environment (8.0_102-b14) (build 1.8.0_102-b14)
        # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode linux-amd64 compressed oops)
        # Problematic frame:
        # V [libjvm.so+0x4c87eb] oopDesc::size()+0x2b

        V [libjvm.so+0x4c87eb] oopDesc::size()+0x2b
        V [libjvm.so+0x4ca770] CMTask::make_reference_grey(oopDesc*, HeapRegion*)+0xc0
        V [libjvm.so+0x70cf6a] JNIHandleBlock::oops_do(OopClosure*)+0x6a
        V [libjvm.so+0x468b50] ClassLoaderData::oops_do(OopClosure*, KlassClosure*, bool)+0x70
        V [libjvm.so+0x656ad0] InstanceMirrorKlass::oop_oop_iterate_nv(oopDesc*, G1CMOopClosure*)+0x40
        V [libjvm.so+0x4cbb44] CMBitMapClosure::do_bit(unsigned long)+0x74
        V [libjvm.so+0x4c522e] CMTask::do_marking_step(double, bool, bool)+0xa9e
        V [libjvm.so+0x4cc483] CMConcurrentMarkingTask::work(unsigned int)+0x293
        V [libjvm.so+0xae6e0f] GangWorker::loop()+0xcf
        V [libjvm.so+0x9249c8] java_start(Thread*)+0x108

        # SIGSEGV (0xb) at pc=0x00007f370686df20, pid=12573, tid=0x00007f370411e700
        #
        # JRE version: Java(TM) SE Runtime Environment (8.0_102-b14) (build 1.8.0_102-b14)
        # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode linux-amd64 compressed oops)
        # Problematic frame:
        # V [libjvm.so+0x70cf20] JNIHandleBlock::oops_do(OopClosure*)+0x20
        V [libjvm.so+0x70cf20] JNIHandleBlock::oops_do(OopClosure*)+0x20
        V [libjvm.so+0x468b50] ClassLoaderData::oops_do(OopClosure*, KlassClosure*, bool)+0x70
        V [libjvm.so+0x63d4cf] InstanceClassLoaderKlass::oop_oop_iterate_nv(oopDesc*, G1CMOopClosure*)+0x3f
        V [libjvm.so+0x4c421b] CMTask::drain_local_queue(bool)+0x14b
        V [libjvm.so+0x4c49ac] CMTask::do_marking_step(double, bool, bool)+0x21c
        V [libjvm.so+0x4cc483] CMConcurrentMarkingTask::work(unsigned int)+0x293
        V [libjvm.so+0xae6e0f] GangWorker::loop()+0xcf
        V [libjvm.so+0x9249c8] java_start(Thread*)+0x108

        From a quick look at the code in classLoaderData.cpp I found that ClassLoaderData::oops_do() iterates over the class loader's handles:

          if (_handles != NULL) {
            _handles->oops_do(f);
          }

        while they can be concurrently updated by:

        jobject ClassLoaderData::add_handle(Handle h) {
          MutexLockerEx ml(metaspace_lock(), Mutex::_no_safepoint_check_flag);
          if (handles() == NULL) {
            set_handles(JNIHandleBlock::allocate_block());
          }
          return handles()->allocate_handle(h());
        }

        Notice that updating the handles in ClassLoaderData::add_handle() is done under the 'metaspace_lock' while iterating over the handles in ClassLoaderData::oops_do() isn't synchronized at all.

        Incidentally, ClassLoaderData::add_handle() is called during class redefinition:

        ClassLoaderData::add_handle(Handle h)
          ConstantPool::initialize_resolved_references
            Rewriter::make_constant_pool_cache
              Rewriter::Rewriter
                Rewriter::rewrite(instanceKlassHandle klass
                  VM_RedefineClasses::load_new_class_versions
                    VM_RedefineClasses::doit_prologue()

         which according to Rafael triggers the bug at his customer.

        I've tried to build a simple test case which reproduces the crash but haven't succeeded until now. With a simple test case we could verify if using the same 'metaspace_lock' that's already used in ClassLoaderData::add_handle() during iteration in ClassLoaderData::oops_do() as well, would actually solve the problem.

        Than of course we'd also have to evaluate the performance costs of such an additional lock and maybe come up with another, cheaper solution (i.e. an additional lock just to serialize ClassLoaderData::add_handle()/ClassLoaderData::oops_do() instead of the more heavy-weight 'metaspace_lock').

        Attachments

          1. hs_err_pid12573.log
            462 kB
            Volker Simonis
          2. hs_err_pid3809.log
            733 kB
            Volker Simonis

          Issue Links

            Activity

              People

                ehelin Erik Helin
                simonis Volker Simonis
                Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: