Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8248851

CMS: Missing memory fences between free chunk check and klass read

    XMLWordPrintable

Details

    • gc
    • b01
    • aarch64
    • generic

    Backports

      Description

        We were witnessing random JVM crash that triggers in one of our production environment.
        We were using an aarch64 jdk8u release build with -XX:+UseConcMarkSweepGC.
        We see three different crash logs [1][2][3].

        Debugging show that this caused by missing memory fences for systems with weak memory model like aarch64.
        For the first crash log, we found that it's possible on aarch64 that the klass load may be scheduled before the free chunk check in CompactibleFreeListSpace::block_size().
        Then we may have an invalid non-null klass, which leads to the crash.
        Same issue exists in CompactibleFreeListSpace::block_is_obj(), which leads to crash log [2] & [3].
        Proposed fix for jdk8u:

        diff -r 93cfec0cf417 src/share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
        --- a/src/share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp Sat Jul 04 00:02:00 2020 +0200
        +++ b/src/share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp Mon Jul 06 21:36:06 2020 +0800
        @@ -994,6 +994,11 @@
                 return res;
               }
             } else {
        + // Bugfix for systems with weak memory model (AARCH64).
        + // Acquire to make sure that the klass read happens after the free
        + // chunk check.
        + OrderAccess::acquire();
        +
               // must read from what 'p' points to in each loop.
               Klass* k = ((volatile oopDesc*)p)->klass_or_null();
               if (k != NULL) {
        @@ -1049,6 +1054,11 @@
                 return res;
               }
             } else {
        + // Bugfix for systems with weak memory model (AARCH64).
        + // Acquire to make sure that the klass read happens after the free
        + // chunk check.
        + OrderAccess::acquire();
        +
               // must read from what 'p' points to in each loop.
               Klass* k = ((volatile oopDesc*)p)->klass_or_null();
               // We trust the size of any object that has a non-NULL
        @@ -1111,6 +1121,12 @@
           // assert(CollectedHeap::use_parallel_gc_threads() || _bt.block_start(p) == p,
           // "Should be a block boundary");
           if (FreeChunk::indicatesFreeChunk(p)) return false;
        +
        + // Bugfix for systems with weak memory model (AARCH64).
        + // Acquire to make sure that the klass read happens after the free
        + // chunk check.
        + OrderAccess::acquire();
        +
           Klass* k = oop(p)->klass_or_null();
           if (k != NULL) {
             // Ignore mark word because it may have been used to

         [1].
        #
        # A fatal error has been detected by the Java Runtime Environment:
        #
        # SIGSEGV (0xb) at pc=0x0000ffffb2f320e8, pid=49265, tid=0x0000ffffb16a41e0
        #
        # JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222)
        # Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-aarch64 compressed oops)
        # Problematic frame:
        # V [libjvm.so+0x4650e8] CompactibleFreeListSpace::block_size(HeapWord const*) const+0x110
        #
        # Core dump written. Default location: /home/vsbo/vsbo_container/modules/vsbo/logs/core or core.49265
        #
        # If you would like to submit a bug report, please visit:
        # http://bugreport.java.com/bugreport/crash.jsp
        #
        Stack: [0x0000ffffb14a5000,0x0000ffffb16a5000], sp=0x0000ffffb16a3450, free space=2041k
        Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
        V [libjvm.so+0x4650e8] CompactibleFreeListSpace::block_size(HeapWord const*) const+0x110
        V [libjvm.so+0x2e6984] BlockOffsetArrayNonContigSpace::block_start_unsafe(void const*) const+0xcc
        V [libjvm.so+0x959494] CardTableModRefBS::process_stride(Space*, MemRegion, int, int, OopsInGenClosure*, CardTableRS*, signed char**, unsigned long, unsigned long)+0x214
        V [libjvm.so+0x95991c] CardTableModRefBS::non_clean_card_iterate_parallel_work(Space*, MemRegion, OopsInGenClosure*, CardTableRS*, int)+0xbc
        V [libjvm.so+0x3bff44] CardTableModRefBS::non_clean_card_iterate_possibly_parallel(Space*, MemRegion, OopsInGenClosure*, CardTableRS*)+0x54
        V [libjvm.so+0x3c01c0] CardTableRS::younger_refs_in_space_iterate(Space*, OopsInGenClosure*)+0x70
        V [libjvm.so+0x4afb68] ConcurrentMarkSweepGeneration::younger_refs_iterate(OopsInGenClosure*)+0x58
        V [libjvm.so+0x5df528] GenCollectedHeap::gen_process_roots(int, bool, bool, GenCollectedHeap::ScanningOption, bool, OopsInGenClosure*, OopsInGenClosure*, CLDClosure*)+0xf8
        V [libjvm.so+0x95c54c] ParNewGenTask::work(unsigned int)+0x144
        V [libjvm.so+0xba78b8] GangWorker::loop()+0xe8
        V [libjvm.so+0x937414] java_start(Thread*)+0x11c
        C [libpthread.so.0+0x78bc] start_thread+0x19c

        [2].
        # A fatal error has been detected by the Java Runtime Environment:
        #
        # SIGSEGV (0xb) at pc=0x0000ffff7fd607a8, pid=28547, tid=0x0000ffff0c7e31e0
        #
        # JRE version: OpenJDK Runtime Environment (8.0_242-b08) (build 1.8.0_242)
        # Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-aarch64 compressed oops)
        # Problematic frame:
        # V [libjvm.so+0x4707a8] FreeListSpace_DCTOC::walk_mem_region_with_cl_par(MemRegion, HeapWord*, HeapWord*, FilteringClosure*)+0x170
        #
        Stack: [0x0000ffff0c5e4000,0x0000ffff0c7e4000], sp=0x0000ffff0c7e2250, free space=2040k
        Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
        V [libjvm.so+0x4707a8] FreeListSpace_DCTOC::walk_mem_region_with_cl_par(MemRegion, HeapWord*, HeapWord*, FilteringClosure*)+0x170
        V [libjvm.so+0x470d64] FreeListSpace_DCTOC::walk_mem_region_with_cl(MemRegion, HeapWord*, HeapWord*, FilteringClosure*)+0x6c
        V [libjvm.so+0xab5d5c] Filtering_DCTOC::walk_mem_region(MemRegion, HeapWord*, HeapWord*)+0x2f4
        V [libjvm.so+0xab31fc] DirtyCardToOopClosure::do_MemRegion(MemRegion)+0x104
        V [libjvm.so+0x3c9860] ClearNoncleanCardWrapper::do_MemRegion(MemRegion)+0xe8
        V [libjvm.so+0x960fcc] CardTableModRefBS::process_stride(Space*, MemRegion, int, int, OopsInGenClosure*, CardTableRS*, signed char**, unsigned long, unsigned long)+0x1bc
        V [libjvm.so+0x9614ac] CardTableModRefBS::non_clean_card_iterate_parallel_work(Space*, MemRegion, OopsInGenClosure*, CardTableRS*, int)+0xbc
        V [libjvm.so+0x3c943c] CardTableModRefBS::non_clean_card_iterate_possibly_parallel(Space*, MemRegion, OopsInGenClosure*, CardTableRS*)+0x54
        V [libjvm.so+0x3c96b8] CardTableRS::younger_refs_in_space_iterate(Space*, OopsInGenClosure*)+0x70
        V [libjvm.so+0x4b8858] ConcurrentMarkSweepGeneration::younger_refs_iterate(OopsInGenClosure*)+0x58
        V [libjvm.so+0x5e94d0] GenCollectedHeap::gen_process_roots(int, bool, bool, GenCollectedHeap::ScanningOption, bool, OopsInGenClosure*, OopsInGenClosure*, CLDClosure*)+0xf8
        V [libjvm.so+0x96422c] ParNewGenTask::work(unsigned int)+0x144
        V [libjvm.so+0xbc7af0] GangWorker::loop()+0xe8
        V [libjvm.so+0x93e6ec] java_start(Thread*)+0x11c
        C [libpthread.so.0+0x78bc] start_thread+0x19c

        [3].
        # A fatal error has been detected by the Java Runtime Environment:
        #
        # SIGSEGV (0xb) at pc=0x0000ffffa07c82bc, pid=34768, tid=0x0000ffff2aff71e0
        #
        # JRE version: OpenJDK Runtime Environment (8.0_252-b09) (build 1.8.0_252)
        # Java VM: OpenJDK 64-Bit Server VM (25.252-b09 mixed mode linux-aarch64 compressed oops)
        # Problematic frame:
        # V [libjvm.so+0xab92bc] DirtyCardToOopClosure::get_actual_top(HeapWord*, HeapWord*)+0xcc
        #
        Stack: [0x0000ffff2adf8000,0x0000ffff2aff8000], sp=0x0000ffff2aff63a0, free space=2040k
        Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
        V [libjvm.so+0xab92bc] DirtyCardToOopClosure::get_actual_top(HeapWord*, HeapWord*)+0xcc
        V [libjvm.so+0xab97ac] DirtyCardToOopClosure::do_MemRegion(MemRegion)+0xb4
        V [libjvm.so+0x3cba90] ClearNoncleanCardWrapper::do_MemRegion(MemRegion)+0xe8
        V [libjvm.so+0x96615c] CardTableModRefBS::process_stride(Space*, MemRegion, int, int, OopsInGenClosure*, CardTableRS*, signed char**, unsigned long, unsigned long)+0x1bc
        V [libjvm.so+0x96663c] CardTableModRefBS::non_clean_card_iterate_parallel_work(Space*, MemRegion, OopsInGenClosure*, CardTableRS*, int)+0xbc
        V [libjvm.so+0x3cb66c] CardTableModRefBS::non_clean_card_iterate_possibly_parallel(Space*, MemRegion, OopsInGenClosure*, CardTableRS*)+0x54
        V [libjvm.so+0x3cb8e8] CardTableRS::younger_refs_in_space_iterate(Space*, OopsInGenClosure*)+0x70
        V [libjvm.so+0x4ba980] ConcurrentMarkSweepGeneration::younger_refs_iterate(OopsInGenClosure*)+0x58
        V [libjvm.so+0x5ec2d0] GenCollectedHeap::gen_process_roots(int, bool, bool, GenCollectedHeap::ScanningOption, bool, OopsInGenClosure*, OopsInGenClosure*, CLDClosure*)+0xf8
        V [libjvm.so+0x9693ac] ParNewGenTask::work(unsigned int)+0x144
        V [libjvm.so+0xbcd698] GangWorker::loop()+0xe8
        V [libjvm.so+0x9438c4] java_start(Thread*)+0x11c
        C [libpthread.so.0+0x78bc] start_thread+0x19c

        Attachments

          Issue Links

            Activity

              People

                fyang Fei Yang
                fyang Fei Yang
                Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: