-
Bug
-
Resolution: Fixed
-
P3
-
9
-
b01
We regularly see OoM-Errors with G1 in our stress tests. We run the tests with the same heap size with ParallelGC and CMS without that problem.
The stress tests are based on real world application code with a lot of threads.
Scenario:
We have an application with a lot of threads and spend time in critical native sections.
1. An evacuation failure happens during a GC.
2. After clean-up work, the safepoint is left.
3. An other thread can't allocate and triggers a new incremental gc.
4. A thread, that can't allocate after an incremental GC, triggers a full GC. However, the GC doesn't start because an other thread
started an incremental GC, the GC-locker is active or the GCLocker initiated GC has not yet been performed.
If an incremental GC doesn't succeed due to the GC-locker, and if this happens more often than GCLockerRetryAllocationCount (=2) an OOME is thrown.
Without critical native code, we would try to trigger a full gc until we succeed. In this case there is just a performance issue, but not an OOME.
The reason is that only G1 splits the "upgrade" of young gc to full gc into multiple VM operations. Between those, the gclocker state can change and prevent full gc.
The problem can be reproduced with the attached program.
The parameters might vary depending on the system.
java -Xmx64m -XX:+UseG1GC -XX:+PrintGC -XX:MaxGCPauseMillis=10 -XX:+UnlockExperimentalVMOptions -XX:-G1ForceFullGCAfterEvacuationFailure -XX:-PrintAdaptiveSizePolicy TestEvacFailureThreaded 10 10000000 10000 10000 10000 10 0.7
A snipped of the output:
#2539: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0062519 secs]
#2540: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0050967 secs]
#2538: [GC concurrent-mark-end, 0.0193436 secs]
#2538: [GC remark, 0.0048717 secs]
#2538: [GC cleanup 62M->62M(64M), 0.0016663 secs]
#2541: [GC pause (GCLocker Initiated GC) (young) 62M->62M(64M), 0.0061165 secs]
#2542: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0063998 secs]
#2543: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0066795 secs]
#2544: [GC pause (GCLocker Initiated GC) (mixed)-- 62M->62M(64M), 0.0082145 secs]
#2545: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0102476 secs]
#2546: [GC pause (GCLocker Initiated GC) (mixed)-- 62M->62M(64M), 0.0142916 secs]
#2547: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0108066 secs]
#2548: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0065968 secs]
#2549: [Full GC (Allocation Failure) 62M->23M(64M), 0.0483837 secs]
java.lang.OutOfMemoryError: Java heap space
at TestEvacFailureThreaded.runTest(TestEvacFailureThreaded.java:75)
at TestEvacFailureThreaded$2.run(TestEvacFailureThreaded.java:138)
The stress tests are based on real world application code with a lot of threads.
Scenario:
We have an application with a lot of threads and spend time in critical native sections.
1. An evacuation failure happens during a GC.
2. After clean-up work, the safepoint is left.
3. An other thread can't allocate and triggers a new incremental gc.
4. A thread, that can't allocate after an incremental GC, triggers a full GC. However, the GC doesn't start because an other thread
started an incremental GC, the GC-locker is active or the GCLocker initiated GC has not yet been performed.
If an incremental GC doesn't succeed due to the GC-locker, and if this happens more often than GCLockerRetryAllocationCount (=2) an OOME is thrown.
Without critical native code, we would try to trigger a full gc until we succeed. In this case there is just a performance issue, but not an OOME.
The reason is that only G1 splits the "upgrade" of young gc to full gc into multiple VM operations. Between those, the gclocker state can change and prevent full gc.
The problem can be reproduced with the attached program.
The parameters might vary depending on the system.
java -Xmx64m -XX:+UseG1GC -XX:+PrintGC -XX:MaxGCPauseMillis=10 -XX:+UnlockExperimentalVMOptions -XX:-G1ForceFullGCAfterEvacuationFailure -XX:-PrintAdaptiveSizePolicy TestEvacFailureThreaded 10 10000000 10000 10000 10000 10 0.7
A snipped of the output:
#2539: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0062519 secs]
#2540: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0050967 secs]
#2538: [GC concurrent-mark-end, 0.0193436 secs]
#2538: [GC remark, 0.0048717 secs]
#2538: [GC cleanup 62M->62M(64M), 0.0016663 secs]
#2541: [GC pause (GCLocker Initiated GC) (young) 62M->62M(64M), 0.0061165 secs]
#2542: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0063998 secs]
#2543: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0066795 secs]
#2544: [GC pause (GCLocker Initiated GC) (mixed)-- 62M->62M(64M), 0.0082145 secs]
#2545: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0102476 secs]
#2546: [GC pause (GCLocker Initiated GC) (mixed)-- 62M->62M(64M), 0.0142916 secs]
#2547: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0108066 secs]
#2548: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0065968 secs]
#2549: [Full GC (Allocation Failure) 62M->23M(64M), 0.0483837 secs]
java.lang.OutOfMemoryError: Java heap space
at TestEvacFailureThreaded.runTest(TestEvacFailureThreaded.java:75)
at TestEvacFailureThreaded$2.run(TestEvacFailureThreaded.java:138)
- blocks
-
JDK-8194877 Clean up code in G1CollectedHeap::attempt_allocation_slow
- Closed
- duplicates
-
JDK-8165150 G1 sometimes performs one or more young gcs with zero sized eden after evacuation failure before issuing a full gc
- Closed
- relates to
-
JDK-8192647 GClocker induced GCs can starve threads requiring memory leading to OOME
- Open
-
JDK-8179226 gc/stress/gclocker/TestGCLockerWithG1.java: fails with OOME Java heap space
- Closed