Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8351464

Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational

XMLWordPrintable

    • gc
    • Cause Known

      TestAllocHumongousFragment#generational may hang on ShenandoahController::handle_alloc_failure, it doesn't always happen, but when I set JTREG_REPEAT_COUNT=1000, the issue can be always re-produced in my test.

      ```
      CONF=linux-aarch64-server-fastdebug JTREG_REPEAT_COUNT=1000 make images test TEST=gc/TestAllocHumongousFragment.java#generational
      ```

      When the test was hanging, I took the thread dump with `pstack`, looks like the Java thread was stuck at method ShenandoahController::handle_alloc_failure:

      ```
      Thread 32 (Thread 0xffff2520d1f0 (LWP 14042)):
      #0 0x0000ffff8de77e38 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
      #1 0x0000ffff8cffc248 in PlatformMonitor::wait (this=this@entry=0xffff841632c8, millis=millis@entry=0) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/os/posix/mutex_posix.hpp:124
      #2 0x0000ffff8cf46988 in Monitor::wait (this=this@entry=0xffff841632c0, timeout=timeout@entry=0) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/runtime/mutex.cpp:254
      #3 0x0000ffff8d1d9c44 in MonitorLocker::wait (this=<synthetic pointer>, this=<synthetic pointer>, timeout=0) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/runtime/mutexLocker.hpp:279

      #4 ShenandoahController::handle_alloc_failure (this=0xffff84162e20, req=..., block=block@entry=true) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahController.cpp:65
      #5 0x0000ffff8d24c160 in ShenandoahHeap::allocate_memory (this=0xffff840d68b0, req=...) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp:532
      #6 0x0000ffff8d24c434 in ShenandoahHeap::mem_allocate (this=<optimized out>, size=<optimized out>, gc_overhead_limit_was_exceeded=<optimized out>) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp:1107
      #7 0x0000ffff8ce9149c in MemAllocator::mem_allocate_outside_tlab (allocation=..., this=0xffff2520be38) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/memory/universe.hpp:297
      #8 MemAllocator::mem_allocate (allocation=..., this=0xffff2520be38) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/memAllocator.cpp:340
      #9 MemAllocator::mem_allocate (this=0xffff2520be38, allocation=...) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/memAllocator.cpp:320

      #10 0x0000ffff8ce915a0 in MemAllocator::allocate (this=this@entry=0xffff2520be38) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/memAllocator.cpp:347
      #11 0x0000ffff8d4cb428 in CollectedHeap::array_allocate (__the_thread__=0xffff843f3060, do_zero=true, length=16732524, size=<optimized out>, klass=0x7001aafb0, this=<optimized out>) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/collectedHeap.inline.hpp:41
      #12 TypeArrayKlass::allocate_common (this=0x7001aafb0, length=length@entry=16732524, do_zero=do_zero@entry=true, __the_thread__=0xffff843f3060, __the_thread__@entry=0xffff2520bee8) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/oops/typeArrayKlass.cpp:96
      #13 0x0000ffff8cfc3b0c in TypeArrayKlass::allocate (__the_thread__=0xffff2520bee8, length=16732524, this=<optimized out>) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/oops/typeArrayKlass.hpp:70
      #14 oopFactory::new_typeArray (type=type@entry=T_INT, length=length@entry=16732524, __the_thread__=__the_thread__@entry=0xffff843f3060) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/memory/oopFactory.cpp:91
      #15 0x0000ffff8c985f64 in InterpreterRuntime::newarray (current=0xffff843f3060, type=T_INT, size=16732524) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/interpreter/interpreterRuntime.cpp:228
      #16 0x0000ffff73a2ba6c in ?? ()
      #17 0x00000000ed200000 in ?? ()
      ```

      Before mutator thread was hung up, there was one old GC:
      ```
      [0.736s][info][gc ] Trigger (Old): Old has overgrown, live at end of previous OLD marking: 1253K, current usage: 710M, percent growth: 57932.4%
      [0.736s][info][gc,free ] Free: 297M, Max: 512K regular, 100M humongous, Frag: 67% external, 0% internal; Used: 0B, Mutator Free: 594 Collector Reserve: 16384K, Max: 512K; Used: 0B Old Collector Reserve: 787K, Max: 511K; Used: 1260K
      [0.736s][info][gc,start ] GC(9) Concurrent reset (Young)
      ...
      [0.740s][info][gc,start ] GC(9) Concurrent marking (Old)
      [0.740s][info][gc,task ] GC(9) Using 4 of 8 workers for concurrent marking
      (Old was successful)
      ...
      [300.740s][info][gc ] Trigger (Young): Time since last GC (300000 ms) is larger than guaranteed interval (300000 ms)
      [300.740s][info][gc,free ] Free: 335M, Max: 512K regular, 100M humongous, Frag: 71% external, 0% internal; Used: 0B, Mutator Free: 671 Collector Reserve: 16384K, Max: 512K; Used: 0B Old Collector Reserve: 1299K, Max: 512K; Used: 1260K
      [300.740s][info][gc,ergo ] GC(10) Start GC cycle (Young)
      ...
      ```
       
      It seems a bug in ShenandoahGenerationalControlThread/ShenandoahRegulatorThread, which was not able to trigger GC to satisfy allocation failure in some particular case.

        There are no Sub-Tasks for this issue.

            wkemper William Kemper
            xpeng Xiaolong Peng
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: