Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8287432

C2: assert(tn->in(0) != __null) failed: must have live top node

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P2
    • Resolution: Fixed
    • Affects Version/s: 11.0.15, 17.0.3, 18.0.1, 19
    • Fix Version/s: 19
    • Component/s: hotspot
    • Environment:
    • Subcomponent:
    • Resolved In Build:
      b26
    • CPU:
      x86
    • OS:
      linux_ubuntu

      Backports

        Description

        Running a specific test of the Deephaven project leads to the following segmentation fault:

        # A fatal error has been detected by the Java Runtime Environment:
        #
        # SIGSEGV (0xb) at pc=0x00007f9d10dc3d8a, pid=39978, tid=39998
        #
        # JRE version: Java(TM) SE Runtime Environment (19.0+24) (build 19-ea+24-1832)
        # Java VM: Java HotSpot(TM) 64-Bit Server VM (19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
        # Problematic frame:
        # V [libjvm.so+0x593d8a] PhaseAggressiveCoalesce::coalesce(Block*)+0x6a

        HOW TO REPRODUCE ON JDK 19 (RELEASE BUILD)

        (Note: these instructions run gradle itself on JDK 11. This can be achieved setting the JAVA_HOME environment variable and/or passing the option -Dorg.gradle.java.home=$JAVA_HOME to all ./gradlew commands.)

        1. git clone --depth 1 --branch nightly/phase-aggressive-sigsegv git@github.com:deephaven/deephaven-core.git
        2. cd deephaven-core
        3. printf 'org.gradle.java.installations.paths=$JDK19_RELEASE_HOME\n' >> gradle.properties
        (optionally, run $ ./gradlew -q javaToolchains to verify that the JDK 19 build is recognized by gradle)
        4. ./gradlew -PtestRuntimeVersion=18 -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental
        (..)
        #
        # A fatal error has been detected by the Java Runtime Environment:
        #
        # SIGSEGV (0xb) at pc=0x00007f9d10dc3d8a, pid=39978, tid=39998
        #
        # JRE version: Java(TM) SE Runtime Environment (19.0+24) (build 19-ea+24-1832)
        # Java VM: Java HotSpot(TM) 64-Bit Server VM (19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
        # Problematic frame:
        # V [libjvm.so+0x593d8a] PhaseAggressiveCoalesce::coalesce(Block*)+0x6a
        (..)
        (if step 4 succeeds, re-run it a few times until the crash is triggered)

        The error log and replay files are attached (hs_err_pid21380.log, replay_pid21380.log).

        HOW TO REPLAY IT ON JDK 19 (DEBUG BUILD)

        The issue seems to be hard to reproduce directly on a debug JDK build. Luckily, it can be replayed on a debug build from the replay file generated from the release build crash:

        1. run steps 1-3 above
        2. download the attached replay file (replay_pid21380.log)
        3. build the classpath required to replay the crash, e.g. by extracting it from the gradle debug information:
        3.1. ./gradlew --info --debug -PtestRuntimeVersion=18 -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental | grep "Using application classpath" | tail -1 > tmp
        3.2. REPLAY_CLASSPATH=$(cat tmp | cut -d "[" -f 4- | cut -d "]" -f 1 | sed 's/, /:/g')
        4. $JDK19_DEBUG_HOME/bin/java -XX:+ReplayCompiles -XX:+ReplayIgnoreInitErrors -XX:ReplayDataFile=replay_pid21380.log -cp "$REPLAY_CLASSPATH"
        (..)
        # To suppress the following error report, specify this argument
        # after -XX: or in .hotspotrc: SuppressErrorAt=/compile.cpp:1214
        #
        # A fatal error has been detected by the Java Runtime Environment:
        #
        # Internal Error (/opt/mach5/mesos/work_dir/slaves/779adf21-f3e5-4e6a-a889-8cc0f9bc6fbb-S66914/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f438564a-997a-4f93-8215-28dc0c0bef6d/runs/f41da18c-9f23-49a7-ab67-ad61fa19003a/workspace/open/src/hotspot/share/opto/compile.cpp:1214), pid=42338, tid=42351
        # assert(tn->in(0) != __null) failed: must have live top node
        #
        # JRE version: Java(TM) SE Runtime Environment (19.0+24) (fastdebug build 19-ea+24-1832)
        # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
        # Problematic frame:
        # V [libjvm.so+0xaa884c] Compile::verify_top(Node*) const+0x17c

        The error log file is attached (hs_err_pid36556.log).

        ORIGINAL REPORT

        Originally posted at: https://github.com/adoptium/adoptium-support/issues/516

        The issue is exhibited from multiple methods, potentially involving array / vectorization optimizations. We've so far worked around it by setting up a compiler directives file with excludes, but that's rather fragile and we are finding more places that eventually hit this error.

        Steps to reproduce
        Currently, we are only able to reproduce using our junit test suite. We've also seen it in our running application, but we don't currently have a framework to easily reproduce that setup. I'm working on creating a more minimal reproduction. Some of our developers are able to reproduce the issue frequently, some are able to reproduce it infrequently, and others appear to not be able to reproduce it. I'm guessing there may be hardware or environmental issues at play. The issue is reproducible within the standard Github Actions runner environment.

        Here's the branch that is meant to reproduce the issue - https://github.com/deephaven/deephaven-core/tree/nightly/phase-aggressive-sigsegv.

        ./gradlew -PtestRuntimeVersion=17 -PtestRuntimeVendor=adoptopenjdk -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental
        The above command may need to be run multiple times (10+) to get the SIGSEGV. By default, it's set to run against a Java 11 (specific version depends on OS and gradle) by default. On my local machine, I can reproduce much more consistently w/ java 17 by setting -PtestRuntimeVersion=17. The nightly/phase-aggressive-sigsegv branch is also setup to run a GH workflow to run these specific tests.

        Triaging info
        The issue is reproducible on the latest versions of OpenJDK 11 and 17 (and have also been reproduced on earlier versions of 11 and 17).

        # JRE version: OpenJDK Runtime Environment Temurin-11.0.15+10 (11.0.15+10) (build 11.0.15+10)
        # Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (11.0.15+10, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
        # Problematic frame:
        # V [libjvm.so+0x62619c] PhaseAggressiveCoalesce::coalesce(Block*)+0x50c

        # JRE version: OpenJDK Runtime Environment Temurin-17.0.3+7 (17.0.3+7) (build 17.0.3+7)
        # Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (17.0.3+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
        # Problematic frame:
        # V [libjvm.so+0x597885] PhaseAggressiveCoalesce::coalesce(Block*)+0x65
        In GH CI, the environment seen so far:

        Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
        Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
        Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
        Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
        Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
        Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
        Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
        I'm currently in the process of collecting more detailed information on our developers' machines.

        Cross-posting our issue: deephaven/deephaven-core#2038

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                chagedorn Christian Hagedorn
                Reporter:
                karianna Martijn Verburg
                Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved: