Developers reported their application experienced extremely long young GC evacuation pauses sometimes after the application running for about 5m~30m, starting from JDK 21.0.4 (last year). When occurred, the long pause appeared to be associated with a spike in GCU usage. In some extreme cases, the long evacuation pauses could reach >90s.
The extremely long young G1 evacuation pauses appeared to be associated with specific usages of the application, which made it difficult to reproduce. I investigated the issue with 'gc+phases=debug' from some test runs with reproduced long evacuation pauses (but magnitude lower, <1s). According to the GC logs, the long pauses were due to long Code Root Scan operations. That connected the issue toJDK-8315503, which was back-ported to JDK 21.0.4. I was not able to reproduce the extreme long young G1 evacuation with the CCStress.java from JDK-8315503. I experimented with increasing the number generated classes with CCStress.java without being able to reproduce.
JDK-8315503 switched to use ConcurrentHashTable to store code root. The initial table size was 2^2 (set with 'Log2DefaultNumBuckets = 2'), which was small. I added a command-line option, which was used for the developers to run with a larger initial hashtable for the code root. The idea was to avoid the operations for growing the table and copying the table entries. With the initial table size set to 2^15, the developers reported the extreme long young G1 evacuation no longer occurred in their non-testing runs.
Using a very large code root hash-table increased memory usages since the code root table was per region. The memory overhead became problematic when very large Java heap was used.
Reporting the issue for more thoughts. I/we was not able to construct a specific test case to demonstrate the issue (sorry about that part).
The extremely long young G1 evacuation pauses appeared to be associated with specific usages of the application, which made it difficult to reproduce. I investigated the issue with 'gc+phases=debug' from some test runs with reproduced long evacuation pauses (but magnitude lower, <1s). According to the GC logs, the long pauses were due to long Code Root Scan operations. That connected the issue to
Using a very large code root hash-table increased memory usages since the code root table was per region. The memory overhead became problematic when very large Java heap was used.
Reporting the issue for more thoughts. I/we was not able to construct a specific test case to demonstrate the issue (sorry about that part).
- caused by
-
JDK-8315503 G1: Code root scan causes long GC pauses due to imbalanced iteration
-
- Resolved
-