Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8282947

JFR: Dump on shutdown live-locks in some conditions

    XMLWordPrintable

Details

    • jfr
    • b25

    Backports

      Description

        One of our tests checks if JFR-enabled builds work. We don't need a recording, so it is dumped to /dev/null. In one of our cases we ended up starting two recordings to /dev/null, and that scenario reliably hangs.

        This is most vividly manifests with just:

        $ java -XX:StartFlightRecording=filename=/dev/null,dumponexit=true -version
        [0.888s][info][jfr,startup] Started recording 1. No limit specified, using maxsize=250MB as default.
        [0.888s][info][jfr,startup]
        [0.888s][info][jfr,startup] Use jcmd 2139696 JFR.dump name=1 to copy recording data to file.
        openjdk version "19-internal" 2022-09-20
        OpenJDK Runtime Environment (fastdebug build 19-internal-adhoc.shade.jdk)
        OpenJDK 64-Bit Server VM (fastdebug build 19-internal-adhoc.shade.jdk, mixed mode, sharing)
        [0.957s][error][jfr ] Unable to complete I/O operation when dumping recording "1" (1)

        <exits>

        $ java -XX:StartFlightRecording=filename=/dev/null,dumponexit=true -XX:StartFlightRecording=filename=/dev/null,dumponexit=true -version
        [0.891s][info][jfr,startup] Started recording 1. No limit specified, using maxsize=250MB as default.
        [0.891s][info][jfr,startup]
        [0.891s][info][jfr,startup] Use jcmd 2139732 JFR.dump name=1 to copy recording data to file.
        [0.922s][info][jfr,startup] Started recording 2. No limit specified, using maxsize=250MB as default.
        [0.922s][info][jfr,startup]
        [0.922s][info][jfr,startup] Use jcmd 2139732 JFR.dump name=2 to copy recording data to file.
        openjdk version "19-internal" 2022-09-20
        OpenJDK Runtime Environment (fastdebug build 19-internal-adhoc.shade.jdk)
        OpenJDK 64-Bit Server VM (fastdebug build 19-internal-adhoc.shade.jdk, mixed mode, sharing)

        <100% CPU load>

        I debugged it a little to ChunksChannel.transferTo, where the call to:

        long w = out.transferFrom(channel, pos, n);

        ...always returns "w == 0" in the live-locking case, thus it gets stuck in the busy loop. It looks to me that ChunksChannel.transferTo completes one channel correctly, but then it moves to nextChannel(), and there "w == 0" always. Which probably explains why this only manifests with double /dev/null dump?

        FWIW, I don't see any guarantee that FileChannel.transferFrom should eventually succeed with non-zero transferred bytes, so this might be as well the generic problem with JFR code.

        Adding this block makes the reproducer above pass:

        diff --git a/src/jdk.jfr/share/classes/jdk/jfr/internal/ChunksChannel.java b/src/jdk.jfr/share/classes/jdk/jfr/internal/ChunksChannel.java
        index b606105d29e..4ac25b1bc93 100644
        --- a/src/jdk.jfr/share/classes/jdk/jfr/internal/ChunksChannel.java
        +++ b/src/jdk.jfr/share/classes/jdk/jfr/internal/ChunksChannel.java
        @@ -100,6 +100,9 @@ final class ChunksChannel implements ReadableByteChannel {
                         while (rem > 0) {
                             long n = Math.min(rem, 1024 * 1024);
                             long w = out.transferFrom(channel, pos, n);
        + if (w == 0) {
        + throw new IOException("Transfer failed");
        + }
                             pos += w;
                             rem -= w;
                         }

        Perhaps a similar case with Dacapo:

        $ java -Xlog:jfr -XX:StartFlightRecording=filename=/dev/null,dumponexit=true -jar dacapo-9.12-MR1-bach.jar -C pmd
        [0.830s][info][jfr] Flight Recorder initialized
        [0.830s][info][jfr] Repository base directory: /tmp
        [0.907s][info][jfr] Started recording "1" (1) {dumponexit=true, filename=/dev/null}
        [0.908s][info][jfr,startup] Started recording 1. No limit specified, using maxsize=250MB as default.
        [0.908s][info][jfr,startup]
        [0.908s][info][jfr,startup] Use jcmd 2194769 JFR.dump name=1 to copy recording data to file.
        ===== DaCapo 9.12-MR1 pmd starting warmup 1 =====
        ===== DaCapo 9.12-MR1 pmd completed warmup 1 in 5533 msec =====
        ===== DaCapo 9.12-MR1 pmd starting warmup 2 =====
        ===== DaCapo 9.12-MR1 pmd completed warmup 2 in 2273 msec =====
        ===== DaCapo 9.12-MR1 pmd starting warmup 3 =====
        ===== DaCapo 9.12-MR1 pmd completed warmup 3 in 2253 msec =====
        ===== DaCapo 9.12-MR1 pmd starting warmup 4 =====
        ===== DaCapo 9.12-MR1 pmd completed warmup 4 in 2317 msec =====
        ===== DaCapo 9.12-MR1 pmd starting =====
        ===== DaCapo 9.12-MR1 pmd PASSED in 2187 msec =====
        [17.229s][info][jfr ] Stopped recording "1" (1). Reason "Dump on exit".

        <live-locks>

        ...and with the patch above it passes well.

        Attachments

          Issue Links

            Activity

              People

                egahlin Erik Gahlin
                shade Aleksey Shipilev
                Votes:
                1 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: