-
Bug
-
Resolution: Fixed
-
P2
-
11.0.15, 17.0.3, 18.0.1, 19
-
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTSIntel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
-
b26
-
x86
-
linux_ubuntu
-
Verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8288802 | 17.0.5-oracle | Tobias Hartmann | P2 | Closed | Fixed | b01 |
JDK-8289637 | 17.0.5 | Goetz Lindenmaier | P2 | Resolved | Fixed | b01 |
JDK-8288805 | 11.0.17-oracle | Tobias Hartmann | P2 | Closed | Fixed | b01 |
JDK-8291717 | 11.0.17 | Andrew Haley | P2 | Resolved | Fixed | b01 |
JDK-8324935 | 8u421 | Daniel Skantz | P2 | Closed | Fixed | b01 |
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f9d10dc3d8a, pid=39978, tid=39998
#
# JRE version: Java(TM) SE Runtime Environment (19.0+24) (build 19-ea+24-1832)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x593d8a] PhaseAggressiveCoalesce::coalesce(Block*)+0x6a
HOW TO REPRODUCE ON JDK 19 (RELEASE BUILD)
(Note: these instructions run gradle itself on JDK 11. This can be achieved setting the JAVA_HOME environment variable and/or passing the option -Dorg.gradle.java.home=$JAVA_HOME to all ./gradlew commands.)
1. git clone --depth 1 --branch nightly/phase-aggressive-sigsegv git@github.com:deephaven/deephaven-core.git
2. cd deephaven-core
3. printf 'org.gradle.java.installations.paths=$JDK19_RELEASE_HOME\n' >> gradle.properties
(optionally, run $ ./gradlew -q javaToolchains to verify that the JDK 19 build is recognized by gradle)
4. ./gradlew -PtestRuntimeVersion=18 -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental
(..)
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f9d10dc3d8a, pid=39978, tid=39998
#
# JRE version: Java(TM) SE Runtime Environment (19.0+24) (build 19-ea+24-1832)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x593d8a] PhaseAggressiveCoalesce::coalesce(Block*)+0x6a
(..)
(if step 4 succeeds, re-run it a few times until the crash is triggered)
The error log and replay files are attached (hs_err_pid21380.log, replay_pid21380.log).
HOW TO REPLAY IT ON JDK 19 (DEBUG BUILD)
The issue seems to be hard to reproduce directly on a debug JDK build. Luckily, it can be replayed on a debug build from the replay file generated from the release build crash:
1. run steps 1-3 above
2. download the attached replay file (replay_pid21380.log)
3. build the classpath required to replay the crash, e.g. by extracting it from the gradle debug information:
3.1. ./gradlew --info --debug -PtestRuntimeVersion=18 -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental | grep "Using application classpath" | tail -1 > tmp
3.2. REPLAY_CLASSPATH=$(cat tmp | cut -d "[" -f 4- | cut -d "]" -f 1 | sed 's/, /:/g')
4. $JDK19_DEBUG_HOME/bin/java -XX:+ReplayCompiles -XX:+ReplayIgnoreInitErrors -XX:ReplayDataFile=replay_pid21380.log -cp "$REPLAY_CLASSPATH"
(..)
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc: SuppressErrorAt=/compile.cpp:1214
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (/opt/mach5/mesos/work_dir/slaves/779adf21-f3e5-4e6a-a889-8cc0f9bc6fbb-S66914/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f438564a-997a-4f93-8215-28dc0c0bef6d/runs/f41da18c-9f23-49a7-ab67-ad61fa19003a/workspace/open/src/hotspot/share/opto/compile.cpp:1214), pid=42338, tid=42351
# assert(tn->in(0) != __null) failed: must have live top node
#
# JRE version: Java(TM) SE Runtime Environment (19.0+24) (fastdebug build 19-ea+24-1832)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0xaa884c] Compile::verify_top(Node*) const+0x17c
The error log file is attached (hs_err_pid36556.log).
ORIGINAL REPORT
Originally posted at: https://github.com/adoptium/adoptium-support/issues/516
The issue is exhibited from multiple methods, potentially involving array / vectorization optimizations. We've so far worked around it by setting up a compiler directives file with excludes, but that's rather fragile and we are finding more places that eventually hit this error.
Steps to reproduce
Currently, we are only able to reproduce using our junit test suite. We've also seen it in our running application, but we don't currently have a framework to easily reproduce that setup. I'm working on creating a more minimal reproduction. Some of our developers are able to reproduce the issue frequently, some are able to reproduce it infrequently, and others appear to not be able to reproduce it. I'm guessing there may be hardware or environmental issues at play. The issue is reproducible within the standard Github Actions runner environment.
Here's the branch that is meant to reproduce the issue - https://github.com/deephaven/deephaven-core/tree/nightly/phase-aggressive-sigsegv.
./gradlew -PtestRuntimeVersion=17 -PtestRuntimeVendor=adoptopenjdk -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental
The above command may need to be run multiple times (10+) to get the SIGSEGV. By default, it's set to run against a Java 11 (specific version depends on OS and gradle) by default. On my local machine, I can reproduce much more consistently w/ java 17 by setting -PtestRuntimeVersion=17. The nightly/phase-aggressive-sigsegv branch is also setup to run a GH workflow to run these specific tests.
Triaging info
The issue is reproducible on the latest versions of OpenJDK 11 and 17 (and have also been reproduced on earlier versions of 11 and 17).
# JRE version: OpenJDK Runtime Environment Temurin-11.0.15+10 (11.0.15+10) (build 11.0.15+10)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (11.0.15+10, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x62619c] PhaseAggressiveCoalesce::coalesce(Block*)+0x50c
# JRE version: OpenJDK Runtime Environment Temurin-17.0.3+7 (17.0.3+7) (build 17.0.3+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (17.0.3+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x597885] PhaseAggressiveCoalesce::coalesce(Block*)+0x65
In GH CI, the environment seen so far:
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
I'm currently in the process of collecting more detailed information on our developers' machines.
Cross-posting our issue: deephaven/deephaven-core#2038
- backported by
-
JDK-8289637 C2: assert(tn->in(0) != __null) failed: must have live top node
- Resolved
-
JDK-8291717 C2: assert(tn->in(0) != __null) failed: must have live top node
- Resolved
-
JDK-8288802 C2: assert(tn->in(0) != __null) failed: must have live top node
- Closed
-
JDK-8288805 C2: assert(tn->in(0) != __null) failed: must have live top node
- Closed
-
JDK-8324935 C2: assert(tn->in(0) != __null) failed: must have live top node
- Closed
- duplicates
-
JDK-8324853 assert(tn->in(0) != NULL) failed: must have live top node
- Closed
- relates to
-
JDK-8210389 C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc
- Resolved
-
JDK-8324853 assert(tn->in(0) != NULL) failed: must have live top node
- Closed
-
JDK-8317301 assert(false) failed: unexpected yanked node
- Closed
- links to
-
Commit openjdk/jdk11u-dev/954d57ae
-
Commit openjdk/jdk17u-dev/d3354af9
-
Commit openjdk/jdk/78d37126
-
Review openjdk/jdk11u-dev/1190
-
Review openjdk/jdk17u-dev/515
-
Review openjdk/jdk18u/165
-
Review openjdk/jdk/9060