Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8137099

G1 needs to "upgrade" GC within the safepoint if it can't allocate during that safepoint to avoid OoME

XMLWordPrintable

    • gc
    • b01

      We regularly see OoM-Errors with G1 in our stress tests. We run the tests with the same heap size with ParallelGC and CMS without that problem.

      The stress tests are based on real world application code with a lot of threads.

      Scenario:
      We have an application with a lot of threads and spend time in critical native sections.

      1. An evacuation failure happens during a GC.
      2. After clean-up work, the safepoint is left.
      3. An other thread can't allocate and triggers a new incremental gc.
      4. A thread, that can't allocate after an incremental GC, triggers a full GC. However, the GC doesn't start because an other thread
          started an incremental GC, the GC-locker is active or the GCLocker initiated GC has not yet been performed.
          If an incremental GC doesn't succeed due to the GC-locker, and if this happens more often than GCLockerRetryAllocationCount (=2) an OOME is thrown.

      Without critical native code, we would try to trigger a full gc until we succeed. In this case there is just a performance issue, but not an OOME.

      The reason is that only G1 splits the "upgrade" of young gc to full gc into multiple VM operations. Between those, the gclocker state can change and prevent full gc.

      The problem can be reproduced with the attached program.
      The parameters might vary depending on the system.

      java -Xmx64m -XX:+UseG1GC -XX:+PrintGC -XX:MaxGCPauseMillis=10 -XX:+UnlockExperimentalVMOptions -XX:-G1ForceFullGCAfterEvacuationFailure -XX:-PrintAdaptiveSizePolicy TestEvacFailureThreaded 10 10000000 10000 10000 10000 10 0.7

      A snipped of the output:

      #2539: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0062519 secs]
      #2540: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0050967 secs]
      #2538: [GC concurrent-mark-end, 0.0193436 secs]
      #2538: [GC remark, 0.0048717 secs]
      #2538: [GC cleanup 62M->62M(64M), 0.0016663 secs]
      #2541: [GC pause (GCLocker Initiated GC) (young) 62M->62M(64M), 0.0061165 secs]
      #2542: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0063998 secs]
      #2543: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0066795 secs]
      #2544: [GC pause (GCLocker Initiated GC) (mixed)-- 62M->62M(64M), 0.0082145 secs]
      #2545: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0102476 secs]
      #2546: [GC pause (GCLocker Initiated GC) (mixed)-- 62M->62M(64M), 0.0142916 secs]
      #2547: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0108066 secs]
      #2548: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0065968 secs]
      #2549: [Full GC (Allocation Failure) 62M->23M(64M), 0.0483837 secs]
      java.lang.OutOfMemoryError: Java heap space
              at TestEvacFailureThreaded.runTest(TestEvacFailureThreaded.java:75)
              at TestEvacFailureThreaded$2.run(TestEvacFailureThreaded.java:138)



            tschatzl Thomas Schatzl
            asiebenborn Axel Siebenborn
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: