Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8282947

JFR: Dump on shutdown live-locks in some conditions

    XMLWordPrintable

    Details

    • Subcomponent:
      jfr
    • Resolved In Build:
      b25

      Description

      One of our tests checks if JFR-enabled builds work. We don't need a recording, so it is dumped to /dev/null. In one of our cases we ended up starting two recordings to /dev/null, and that scenario reliably hangs.

      This is most vividly manifests with just:

      $ java -XX:StartFlightRecording=filename=/dev/null,dumponexit=true -version
      [0.888s][info][jfr,startup] Started recording 1. No limit specified, using maxsize=250MB as default.
      [0.888s][info][jfr,startup]
      [0.888s][info][jfr,startup] Use jcmd 2139696 JFR.dump name=1 to copy recording data to file.
      openjdk version "19-internal" 2022-09-20
      OpenJDK Runtime Environment (fastdebug build 19-internal-adhoc.shade.jdk)
      OpenJDK 64-Bit Server VM (fastdebug build 19-internal-adhoc.shade.jdk, mixed mode, sharing)
      [0.957s][error][jfr ] Unable to complete I/O operation when dumping recording "1" (1)

      <exits>

      $ java -XX:StartFlightRecording=filename=/dev/null,dumponexit=true -XX:StartFlightRecording=filename=/dev/null,dumponexit=true -version
      [0.891s][info][jfr,startup] Started recording 1. No limit specified, using maxsize=250MB as default.
      [0.891s][info][jfr,startup]
      [0.891s][info][jfr,startup] Use jcmd 2139732 JFR.dump name=1 to copy recording data to file.
      [0.922s][info][jfr,startup] Started recording 2. No limit specified, using maxsize=250MB as default.
      [0.922s][info][jfr,startup]
      [0.922s][info][jfr,startup] Use jcmd 2139732 JFR.dump name=2 to copy recording data to file.
      openjdk version "19-internal" 2022-09-20
      OpenJDK Runtime Environment (fastdebug build 19-internal-adhoc.shade.jdk)
      OpenJDK 64-Bit Server VM (fastdebug build 19-internal-adhoc.shade.jdk, mixed mode, sharing)

      <100% CPU load>

      I debugged it a little to ChunksChannel.transferTo, where the call to:

      long w = out.transferFrom(channel, pos, n);

      ...always returns "w == 0" in the live-locking case, thus it gets stuck in the busy loop. It looks to me that ChunksChannel.transferTo completes one channel correctly, but then it moves to nextChannel(), and there "w == 0" always. Which probably explains why this only manifests with double /dev/null dump?

      FWIW, I don't see any guarantee that FileChannel.transferFrom should eventually succeed with non-zero transferred bytes, so this might be as well the generic problem with JFR code.

      Adding this block makes the reproducer above pass:

      diff --git a/src/jdk.jfr/share/classes/jdk/jfr/internal/ChunksChannel.java b/src/jdk.jfr/share/classes/jdk/jfr/internal/ChunksChannel.java
      index b606105d29e..4ac25b1bc93 100644
      --- a/src/jdk.jfr/share/classes/jdk/jfr/internal/ChunksChannel.java
      +++ b/src/jdk.jfr/share/classes/jdk/jfr/internal/ChunksChannel.java
      @@ -100,6 +100,9 @@ final class ChunksChannel implements ReadableByteChannel {
                       while (rem > 0) {
                           long n = Math.min(rem, 1024 * 1024);
                           long w = out.transferFrom(channel, pos, n);
      + if (w == 0) {
      + throw new IOException("Transfer failed");
      + }
                           pos += w;
                           rem -= w;
                       }

      Perhaps a similar case with Dacapo:

      $ java -Xlog:jfr -XX:StartFlightRecording=filename=/dev/null,dumponexit=true -jar dacapo-9.12-MR1-bach.jar -C pmd
      [0.830s][info][jfr] Flight Recorder initialized
      [0.830s][info][jfr] Repository base directory: /tmp
      [0.907s][info][jfr] Started recording "1" (1) {dumponexit=true, filename=/dev/null}
      [0.908s][info][jfr,startup] Started recording 1. No limit specified, using maxsize=250MB as default.
      [0.908s][info][jfr,startup]
      [0.908s][info][jfr,startup] Use jcmd 2194769 JFR.dump name=1 to copy recording data to file.
      ===== DaCapo 9.12-MR1 pmd starting warmup 1 =====
      ===== DaCapo 9.12-MR1 pmd completed warmup 1 in 5533 msec =====
      ===== DaCapo 9.12-MR1 pmd starting warmup 2 =====
      ===== DaCapo 9.12-MR1 pmd completed warmup 2 in 2273 msec =====
      ===== DaCapo 9.12-MR1 pmd starting warmup 3 =====
      ===== DaCapo 9.12-MR1 pmd completed warmup 3 in 2253 msec =====
      ===== DaCapo 9.12-MR1 pmd starting warmup 4 =====
      ===== DaCapo 9.12-MR1 pmd completed warmup 4 in 2317 msec =====
      ===== DaCapo 9.12-MR1 pmd starting =====
      ===== DaCapo 9.12-MR1 pmd PASSED in 2187 msec =====
      [17.229s][info][jfr ] Stopped recording "1" (1). Reason "Dump on exit".

      <live-locks>

      ...and with the patch above it passes well.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              egahlin Erik Gahlin
              Reporter:
              shade Aleksey Shipilev
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: