Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8210706

G1 may deadlock when starting a concurrent cycle at shutdown

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P2 P2
    • 12
    • 12
    • hotspot
    • gc

      tl;dr: there's a race condition between the VM shutdown and G1, in that the
      VM shutdown sequence may have just terminated all the concurrent GC
      threads, and at the same time G1 has just started an incremental collection
      cycle and is waiting for the concurrent marker to finish. Since there's no
      concurrent GC thread alive to update the status anymore, G1 waits
      indefinitely while holding a safepoint, causing the VM to deadlock.

      Details:

      3 threads are involved in the actual deadlock. Detailed stack traces are at
      the end of this email.

      1. A Java application thread at an allocation site triggering a G1
      incremental collection
      2. A thread that called System.exit(), initiating the VM shutdown sequence.
      It's in VM's native code so it doesn't block a safepoint.
      3. VM thread, already inside of a safepoint and started running G1's
      incremental collection.
      (4. "the world" is at a safepoint so all other Java threads are just
      waiting)

      The problem is, Thread 2 has already run half way into before_exit(), and
      one of the steps in there is:

      hotspot/src/share/vm/runtime/java.cpp

      503 // Stop concurrent GC threads
      504 Universe::heap()->stop();

      But G1 is waiting for the concurrent marker to finish scanning the root
      regions:

      hotspot/src/share/vm/gc_implementation/g1/concurrentMark.cpp

       506 bool CMRootRegions::wait_until_scan_finished() {
       507 if (!scan_in_progress()) return false;
       508
       509 {
       510 MutexLockerEx x(RootRegionScan_lock,
      Mutex::_no_safepoint_check_flag);
       511 while (scan_in_progress()) {
       512 RootRegionScan_lock->wait(Mutex::_no_safepoint_check_flag);
       513 }
       514 }
       515 return true;
       516 }

      But scan_in_process is only updated in a few places:

      hotspot/src/share/vm/gc_implementation/g1/concurrentMark.cpp

       449 void CMRootRegions::prepare_for_scan() {
       450 assert(!scan_in_progress(), "pre-condition");
       451
       452 // Currently, only survivors can be root regions.
       453 assert(_next_survivor == NULL, "pre-condition");
       454 _next_survivor = _young_list->first_survivor_region();
       455 _scan_in_progress = (_next_survivor != NULL);
       456 _should_abort = false;
       457 }

       490 void CMRootRegions::scan_finished() {
       491 assert(scan_in_progress(), "pre-condition");
       492
       493 // Currently, only survivors can be root regions.
       494 if (!_should_abort) {
       495 assert(_next_survivor == NULL, "we should have claimed all
      survivors");
       496 }
       497 _next_survivor = NULL;
       498
       499 {
       500 MutexLockerEx x(RootRegionScan_lock,
      Mutex::_no_safepoint_check_flag);
       501 _scan_in_progress = false;
       502 RootRegionScan_lock->notify_all();
       503 }
       504 }

      And with the current GC threads gone, nobody ever gets to set
      scan_in_process back to false, which leads to an infinite wait.

      One way I'd imagine fixing this is by adding a check in G1's
      CMRootRegions::wait_until_scan_finished() to see if the concurrent marker
      thread is still there, and if it isn't, there's no point in waiting anymore.
      Obviously similar fixes would have to be done separately for other
      concurrent GCs in HotSpot.

      What do you think?

      Thanks,
      Kris

      Detail thread stacks for in the example:

      1. A Java application thread at an allocation site triggering a G1
      incremental collection:

      Thread 2164 (Thread 0x7fcfc4e11700 (LWP 25218)):
      #0 0x00007fd29387b360 in pthread_cond_wait@@GLIBC_2.3.2 () at
      ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1 0x00007fd2931e9a1b in os::PlatformEvent::park()
      (this=this@entry=0x7fd05c029c00)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:5842
      #2 0x00007fd2931a256f in Monitor::IWait(Thread*, long) (timo=0,
      ev=0x7fd05c029c00) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:424
      #3 0x00007fd2931a256f in Monitor::IWait(Thread*, long)
      (this=this@entry=0x7fd28c012800,
      Self=Self@entry=0x7fd1ac0f2000, timo=timo@entry=0) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:802
      #4 0x00007fd2931a382e in Monitor::wait(bool, long, bool)
      (this=0x7fd28c012800, no_safepoint_check=<optimized out>,
      timeout=timeout@entry=0,
      as_suspend_equivalent=as_suspend_equivalent@entry=false)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:1126
      #5 0x00007fd29339d5fc in VMThread::execute(VM_Operation*)
      (op=op@entry=0x7fcfc4e0c430)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:642
      #6 0x00007fd292e99fec in G1CollectedHeap::attempt_allocation_slow(unsigned
      long, unsigned char, unsigned int*, unsigned int*)
      (gc_cause=GCCause::_g1_inc_collection_pause, succeeded=<synthetic pointer>,
      gc_count_before=<optimized out>, word_size=1026, this=0x7fd28c033c40) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:3628
      #7 0x00007fd292e99fec in G1CollectedHeap::attempt_allocation_slow(unsigned
      long, unsigned char, unsigned int*, unsigned int*)
      (this=this@entry=0x7fd28c033c40,
      word_size=word_size@entry=1026, context=context@entry=0 '\000',
      gc_count_before_ret=gc_count_before_ret@entry=0x7fcfc4e0c50c,
      gclocker_retry_count_ret=gclocker_retry_count_ret@entry=0x7fcfc4e0c508) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:956
      #8 0x00007fd292e9b03b in G1CollectedHeap::mem_allocate(unsigned long,
      bool*) (gclocker_retry_count_ret=0x7fcfc4e0c508,
      gc_count_before_ret=0x7fcfc4e0c50c, word_size=1026, this=0x7fd28c033c40) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.inline.hpp:147
      #9 0x00007fd292e9b03b in G1CollectedHeap::mem_allocate(unsigned long,
      bool*) (this=0x7fd28c033c40, word_size=1026,
      gc_overhead_limit_was_exceeded=<optimized out>) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:846
      #10 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
      Thread*) (__the_thread__=0x7fd1ac0f2000, size=1026, klass=...) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:135
      #11 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
      Thread*) (__the_thread__=0x7fd1ac0f2000, size=1026, klass=...) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:175
      #12 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
      Thread*) (__the_thread__=0x7fd1ac0f2000, length=8192, size=1026, klass=...)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:216
      #13 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
      Thread*) (this=0x7c0000768, length=length@entry=8192,
      do_zero=do_zero@entry=true,
      __the_thread__=__the_thread__@entry=0x7fd1ac0f2000) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/oops/typeArrayKlass.cpp:107
      #14 0x00007fd2931d3550 in oopFactory::new_typeArray(BasicType, int,
      Thread*) (__the_thread__=__the_thread__@entry=0x7fd1ac0f2000,
      length=length@entry=8192, this=<optimized out>) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/oops/typeArrayKlass.hpp:67
      #15 0x00007fd2931d3550 in oopFactory::new_typeArray(BasicType, int,
      Thread*) (type=<optimized out>, length=length@entry=8192,
      __the_thread__=__the_thread__@entry=0x7fd1ac0f2000) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/memory/oopFactory.cpp:56
      #16 0x00007fd293283457 in OptoRuntime::new_array_C(Klass*, int,
      JavaThread*) (array_type=<optimized out>, len=8192, thread=0x7fd1ac0f2000)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/opto/runtime.cpp:279
      #17 0x00007fd27d0871c7 in ()

      2. A thread that called System.exit(), initiating the VM shutdown sequence.
      It's in native code so it doesn't block a safepoint:

      Thread 563 (Thread 0x7fd017980700 (LWP 7959)):
      #0 0x00007fd29387b360 in pthread_cond_wait@@GLIBC_2.3.2 () at
      ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1 0x00007fd2931e9a1b in os::PlatformEvent::park() (this=0x7fd0f4015000)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:5842
      #2 0x00007fd293064485 in JvmtiRawMonitor::SimpleWait(Thread*, long)
      (this=this@entry=0x7fd28f22f020, Self=Self@entry=0x7fd28f0c9800,
      millis=millis@entry=-1) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiRawMonitor.cpp:194
      #3 0x00007fd293064c75 in JvmtiRawMonitor::raw_wait(long, bool, Thread*)
      (this=this@entry=0x7fd28f22f020, millis=millis@entry=-1,
      interruptible=interruptible@entry=true,
      __the_thread__=__the_thread__@entry=0x7fd28f0c9800)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiRawMonitor.cpp:383
      #4 0x00007fd293040c09 in JvmtiEnv::RawMonitorWait(JvmtiRawMonitor*, long)
      (this=<optimized out>, rmonitor=0x7fd28f22f020, millis=-1) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiEnv.cpp:3144
      #5 0x00007fd291620af8 in debugMonitorWait (monitor=<optimized out>) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/jdk/src/share/back/util.c:1075
      #6 0x00007fd29160e86c in cbVMDeath (jvmti_env=<optimized out>,
      env=0x7fd28f0c99e0) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/jdk/src/share/back/eventHandler.c:1273
      #7 0x00007fd29304f4b0 in JvmtiExport::post_vm_death() () at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiExport.cpp:490
      #8 0x00007fd292f896dd in before_exit(JavaThread*) (thread=<optimized out>)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/java.cpp:532
      #9 0x00007fd292fd005b in JVM_Halt(jint) (code=1) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvm.cpp:441
      #10 0x00007fd27d020868 in [native] java.lang.Shutdown.halt0(int) () at
      java/lang/Shutdown.java:0
      #11 0x00007fd27d00809d in [interpreted: bc = 7]
      java.lang.Shutdown.halt(int) () at java/lang/Shutdown.java:140
      #12 0x00007fd27d00809d in [interpreted: bc = 99]
      java.lang.Shutdown.exit(int) () at java/lang/Shutdown.java:214
      #13 0x00007fd27d00809d in [interpreted: bc = 14]
      java.lang.Runtime.exit(int) () at java/lang/Runtime.java:110
      #14 0x00007fd27d00809d in [interpreted: bc = 4] java.lang.System.exit(int)
      () at java/lang/System.java:972
      #15 0x00007fd27d00809d in [interpreted: bc = 1]
      scala.sys.package$.exit(int) () at scala/sys/package.java:41

      3. VM thread, already inside of a safepoint and started running G1's
      incremental collection:

      Thread 19 (Thread 0x7fd1d70ab700 (LWP 47)):
      #0 0x00007fd29387b360 in pthread_cond_wait@@GLIBC_2.3.2 () at
      ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1 0x00007fd2931e9a1b in os::PlatformEvent::park()
      (this=this@entry=0x7fd28c498200)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:5842
      #2 0x00007fd2931a256f in Monitor::IWait(Thread*, long) (timo=0,
      ev=0x7fd28c498200) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:424
      #3 0x00007fd2931a256f in Monitor::IWait(Thread*, long)
      (this=this@entry=0x7fd28c011700,
      Self=Self@entry=0x7fd28c497800, timo=timo@entry=0) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:802
      #4 0x00007fd2931a37fa in Monitor::wait(bool, long, bool)
      (this=0x7fd28c011700, no_safepoint_check=no_safepoint_check@entry=true,
      timeout=timeout@entry=0,
      as_suspend_equivalent=as_suspend_equivalent@entry=false)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:1111
      #5 0x00007fd292db1fe2 in CMRootRegions::wait_until_scan_finished()
      (this=0x7fd28c0826b8) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/concurrentMark.cpp:512
      #6 0x00007fd292ea59a4 in
      G1CollectedHeap::do_collection_pause_at_safepoint(double)
      (this=this@entry=0x7fd28c033c40,
      target_pause_time_ms=200) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:4095
      #7 0x00007fd29339f002 in VM_G1IncCollectionPause::doit()
      (this=0x7fcfc4e0c430) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/vm_operations_g1.cpp:148
      #8 0x00007fd29339dfd7 in VM_Operation::evaluate()
      (this=this@entry=0x7fcfc4e0c430)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vm_operations.cpp:62
      #9 0x00007fd29339b6d7 in VMThread::evaluate_operation(VM_Operation*)
      (op=0x7fcfc4e0c430, this=<optimized out>) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:377
      #10 0x00007fd29339cb5f in VMThread::loop() (this=this@entry=0x7fd28c497800)
      at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:502
      #11 0x00007fd29339d051 in VMThread::run() (this=0x7fd28c497800) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:276
      #12 0x00007fd2931e1222 in java_start(Thread*) (thread=0x7fd28c497800) at
      /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:790

      (reported by K. Mok, http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-September/034145.html)

            tschatzl Thomas Schatzl
            tschatzl Thomas Schatzl
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: