Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8347833

CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • 25
    • hotspot
    • gc
    • generic
    • generic

      ADDITIONAL SYSTEM INFORMATION :
      17.0.13

      A DESCRIPTION OF THE PROBLEM :
      We run a JVM with -XX:+HeapDumpOnOutOfMemoryError -XX:+CrashOnOutOfMemoryError.
      The intention is to stop the application and generate a heap dump, immediately when the OOME is detected.

      However, what we notice is that the process of dumping the heap in some cases takes insane amount of time. The reason is that during the heap dump, the GC pauses are still running are being triggered.
      My suggestion is to stop the GC threads, when the combination of -XX:+HeapDumpOnOutOfMemoryError -XX:+CrashOnOutOfMemoryError startup parameters is detected. CrashOnOutOfMemoryError basically means that the user doesn't want the JVM to perform any action after OOME.

      To see the issue in the logs, run the below program with -XX:+HeapDumpOnOutOfMemoryError -XX:+CrashOnOutOfMemoryError -Xmx3G -verbose:gc

      public class OutOfMemoryErrorTest {
      public static void main(String[] args) {
      // trigger OOM
      var list = new LinkedList<String>();
      while (true) {
      list.add("aaa");
      if (list.size() % 1_000_000 == 0) {
      System.out.println("List size: " + list.size());
      }
      }
      }
      }

      You will see logs similar to this.
      What is interesting is that between the first occurence of OutOfMemoryError and the file completing, JVM is logging a lot of GC pauses.

      java.lang.OutOfMemoryError: Java heap space
      Dumping heap to java_pid4489.hprof ...
      [85.541s][info][gc] GC(111) Pause Young (Concurrent Start) (G1 Evacuation Pause) 3068M->3068M(3072M) 2.225ms
      [85.541s][info][gc] GC(113) Concurrent Mark Cycle
      [89.094s][info][gc] GC(112) Pause Full (G1 Compaction Pause) 3068M->3068M(3072M) 3552.943ms
      [92.866s][info][gc] GC(114) Pause Full (G1 Compaction Pause) 3068M->3068M(3072M) 3771.683ms
      [92.866s][info][gc] GC(113) Concurrent Mark Cycle 7324.984ms
      [92.869s][info][gc] GC(115) Pause Young (Normal) (G1 Evacuation Pause) 3068M->3068M(3072M) 2.843ms
      [97.040s][info][gc] GC(116) Pause Full (G1 Compaction Pause) 3068M->3068M(3072M) 4171.539ms
      [101.603s][info][gc] GC(117) Pause Full (G1 Compaction Pause) 3068M->3068M(3072M) 4562.222ms
      Heap dump file created [6543452335 bytes in 138.619 secs]
      Aborting due to java.lang.OutOfMemoryError: Java heap space


      I think this is because, in this code (debug.cpp), the system doesn't stop the GC on the first detection of OOME.

        if (Atomic::cmpxchg(&out_of_memory_reported, 0, 1) == 0) {
          // create heap dump before OnOutOfMemoryError commands are executed
          if (HeapDumpOnOutOfMemoryError) {
            tty->print_cr("java.lang.OutOfMemoryError: %s", message);
            HeapDumper::dump_heap_from_oome();
          }

          if (OnOutOfMemoryError && OnOutOfMemoryError[0]) {
            VMError::report_java_out_of_memory(message);
          }

          if (CrashOnOutOfMemoryError) {
            tty->print_cr("Aborting due to java.lang.OutOfMemoryError: %s", message);
            report_fatal(OOM_JAVA_HEAP_FATAL, __FILE__, __LINE__, "OutOfMemory encountered: %s", message);
          }

      This could be handled better by checking CrashOnOutOfMemoryError at the beginning of the branch, but before dump_heap_from_oome(), suspending whatever needs to be suspended there.


            Unassigned Unassigned
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: