-
Bug
-
Resolution: Unresolved
-
P3
-
None
-
25
-
Cause Known
TestAllocHumongousFragment#generational may hang on ShenandoahController::handle_alloc_failure, it doesn't always happen, but when I set JTREG_REPEAT_COUNT=1000, the issue can be always re-produced in my test.
```
CONF=linux-aarch64-server-fastdebug JTREG_REPEAT_COUNT=1000 make images test TEST=gc/TestAllocHumongousFragment.java#generational
```
When the test was hanging, I took the thread dump with `pstack`, looks like the Java thread was stuck at method ShenandoahController::handle_alloc_failure:
```
Thread 32 (Thread 0xffff2520d1f0 (LWP 14042)):
#0 0x0000ffff8de77e38 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000ffff8cffc248 in PlatformMonitor::wait (this=this@entry=0xffff841632c8, millis=millis@entry=0) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/os/posix/mutex_posix.hpp:124
#2 0x0000ffff8cf46988 in Monitor::wait (this=this@entry=0xffff841632c0, timeout=timeout@entry=0) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/runtime/mutex.cpp:254
#3 0x0000ffff8d1d9c44 in MonitorLocker::wait (this=<synthetic pointer>, this=<synthetic pointer>, timeout=0) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/runtime/mutexLocker.hpp:279
#4 ShenandoahController::handle_alloc_failure (this=0xffff84162e20, req=..., block=block@entry=true) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahController.cpp:65
#5 0x0000ffff8d24c160 in ShenandoahHeap::allocate_memory (this=0xffff840d68b0, req=...) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp:532
#6 0x0000ffff8d24c434 in ShenandoahHeap::mem_allocate (this=<optimized out>, size=<optimized out>, gc_overhead_limit_was_exceeded=<optimized out>) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp:1107
#7 0x0000ffff8ce9149c in MemAllocator::mem_allocate_outside_tlab (allocation=..., this=0xffff2520be38) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/memory/universe.hpp:297
#8 MemAllocator::mem_allocate (allocation=..., this=0xffff2520be38) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/memAllocator.cpp:340
#9 MemAllocator::mem_allocate (this=0xffff2520be38, allocation=...) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/memAllocator.cpp:320
#10 0x0000ffff8ce915a0 in MemAllocator::allocate (this=this@entry=0xffff2520be38) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/memAllocator.cpp:347
#11 0x0000ffff8d4cb428 in CollectedHeap::array_allocate (__the_thread__=0xffff843f3060, do_zero=true, length=16732524, size=<optimized out>, klass=0x7001aafb0, this=<optimized out>) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/collectedHeap.inline.hpp:41
#12 TypeArrayKlass::allocate_common (this=0x7001aafb0, length=length@entry=16732524, do_zero=do_zero@entry=true, __the_thread__=0xffff843f3060, __the_thread__@entry=0xffff2520bee8) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/oops/typeArrayKlass.cpp:96
#13 0x0000ffff8cfc3b0c in TypeArrayKlass::allocate (__the_thread__=0xffff2520bee8, length=16732524, this=<optimized out>) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/oops/typeArrayKlass.hpp:70
#14 oopFactory::new_typeArray (type=type@entry=T_INT, length=length@entry=16732524, __the_thread__=__the_thread__@entry=0xffff843f3060) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/memory/oopFactory.cpp:91
#15 0x0000ffff8c985f64 in InterpreterRuntime::newarray (current=0xffff843f3060, type=T_INT, size=16732524) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/interpreter/interpreterRuntime.cpp:228
#16 0x0000ffff73a2ba6c in ?? ()
#17 0x00000000ed200000 in ?? ()
```
Before mutator thread was hung up, there was one old GC:
```
[0.736s][info][gc ] Trigger (Old): Old has overgrown, live at end of previous OLD marking: 1253K, current usage: 710M, percent growth: 57932.4%
[0.736s][info][gc,free ] Free: 297M, Max: 512K regular, 100M humongous, Frag: 67% external, 0% internal; Used: 0B, Mutator Free: 594 Collector Reserve: 16384K, Max: 512K; Used: 0B Old Collector Reserve: 787K, Max: 511K; Used: 1260K
[0.736s][info][gc,start ] GC(9) Concurrent reset (Young)
...
[0.740s][info][gc,start ] GC(9) Concurrent marking (Old)
[0.740s][info][gc,task ] GC(9) Using 4 of 8 workers for concurrent marking
(Old was successful)
...
[300.740s][info][gc ] Trigger (Young): Time since last GC (300000 ms) is larger than guaranteed interval (300000 ms)
[300.740s][info][gc,free ] Free: 335M, Max: 512K regular, 100M humongous, Frag: 71% external, 0% internal; Used: 0B, Mutator Free: 671 Collector Reserve: 16384K, Max: 512K; Used: 0B Old Collector Reserve: 1299K, Max: 512K; Used: 1260K
[300.740s][info][gc,ergo ] GC(10) Start GC cycle (Young)
...
```
It seems a bug in ShenandoahGenerationalControlThread/ShenandoahRegulatorThread, which was not able to trigger GC to satisfy allocation failure in some particular case.
```
CONF=linux-aarch64-server-fastdebug JTREG_REPEAT_COUNT=1000 make images test TEST=gc/TestAllocHumongousFragment.java#generational
```
When the test was hanging, I took the thread dump with `pstack`, looks like the Java thread was stuck at method ShenandoahController::handle_alloc_failure:
```
Thread 32 (Thread 0xffff2520d1f0 (LWP 14042)):
#0 0x0000ffff8de77e38 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000ffff8cffc248 in PlatformMonitor::wait (this=this@entry=0xffff841632c8, millis=millis@entry=0) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/os/posix/mutex_posix.hpp:124
#2 0x0000ffff8cf46988 in Monitor::wait (this=this@entry=0xffff841632c0, timeout=timeout@entry=0) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/runtime/mutex.cpp:254
#3 0x0000ffff8d1d9c44 in MonitorLocker::wait (this=<synthetic pointer>, this=<synthetic pointer>, timeout=0) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/runtime/mutexLocker.hpp:279
#4 ShenandoahController::handle_alloc_failure (this=0xffff84162e20, req=..., block=block@entry=true) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahController.cpp:65
#5 0x0000ffff8d24c160 in ShenandoahHeap::allocate_memory (this=0xffff840d68b0, req=...) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp:532
#6 0x0000ffff8d24c434 in ShenandoahHeap::mem_allocate (this=<optimized out>, size=<optimized out>, gc_overhead_limit_was_exceeded=<optimized out>) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp:1107
#7 0x0000ffff8ce9149c in MemAllocator::mem_allocate_outside_tlab (allocation=..., this=0xffff2520be38) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/memory/universe.hpp:297
#8 MemAllocator::mem_allocate (allocation=..., this=0xffff2520be38) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/memAllocator.cpp:340
#9 MemAllocator::mem_allocate (this=0xffff2520be38, allocation=...) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/memAllocator.cpp:320
#10 0x0000ffff8ce915a0 in MemAllocator::allocate (this=this@entry=0xffff2520be38) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/memAllocator.cpp:347
#11 0x0000ffff8d4cb428 in CollectedHeap::array_allocate (__the_thread__=0xffff843f3060, do_zero=true, length=16732524, size=<optimized out>, klass=0x7001aafb0, this=<optimized out>) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shared/collectedHeap.inline.hpp:41
#12 TypeArrayKlass::allocate_common (this=0x7001aafb0, length=length@entry=16732524, do_zero=do_zero@entry=true, __the_thread__=0xffff843f3060, __the_thread__@entry=0xffff2520bee8) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/oops/typeArrayKlass.cpp:96
#13 0x0000ffff8cfc3b0c in TypeArrayKlass::allocate (__the_thread__=0xffff2520bee8, length=16732524, this=<optimized out>) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/oops/typeArrayKlass.hpp:70
#14 oopFactory::new_typeArray (type=type@entry=T_INT, length=length@entry=16732524, __the_thread__=__the_thread__@entry=0xffff843f3060) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/memory/oopFactory.cpp:91
#15 0x0000ffff8c985f64 in InterpreterRuntime::newarray (current=0xffff843f3060, type=T_INT, size=16732524) at /home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/interpreter/interpreterRuntime.cpp:228
#16 0x0000ffff73a2ba6c in ?? ()
#17 0x00000000ed200000 in ?? ()
```
Before mutator thread was hung up, there was one old GC:
```
[0.736s][info][gc ] Trigger (Old): Old has overgrown, live at end of previous OLD marking: 1253K, current usage: 710M, percent growth: 57932.4%
[0.736s][info][gc,free ] Free: 297M, Max: 512K regular, 100M humongous, Frag: 67% external, 0% internal; Used: 0B, Mutator Free: 594 Collector Reserve: 16384K, Max: 512K; Used: 0B Old Collector Reserve: 787K, Max: 511K; Used: 1260K
[0.736s][info][gc,start ] GC(9) Concurrent reset (Young)
...
[0.740s][info][gc,start ] GC(9) Concurrent marking (Old)
[0.740s][info][gc,task ] GC(9) Using 4 of 8 workers for concurrent marking
(Old was successful)
...
[300.740s][info][gc ] Trigger (Young): Time since last GC (300000 ms) is larger than guaranteed interval (300000 ms)
[300.740s][info][gc,free ] Free: 335M, Max: 512K regular, 100M humongous, Frag: 71% external, 0% internal; Used: 0B, Mutator Free: 671 Collector Reserve: 16384K, Max: 512K; Used: 0B Old Collector Reserve: 1299K, Max: 512K; Used: 1260K
[300.740s][info][gc,ergo ] GC(10) Start GC cycle (Young)
...
```
It seems a bug in ShenandoahGenerationalControlThread/ShenandoahRegulatorThread, which was not able to trigger GC to satisfy allocation failure in some particular case.
- relates to
-
JDK-8349094 GenShen: Race between control and regulator threads may violate assertions
-
- Resolved
-
- links to
-
Review(master) openjdk/jdk/23997
There are no Sub-Tasks for this issue.