Details
-
Bug
-
Resolution: Fixed
-
P3
-
19
-
b07
Backports
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8298573 | 17.0.7-oracle | Calvin Cheung | P3 | Resolved | Fixed | b01 |
JDK-8302197 | 17.0.7 | Goetz Lindenmaier | P3 | Resolved | Fixed | b02 |
JDK-8299140 | 11.0.19-oracle | Calvin Cheung | P3 | Resolved | Fixed | b01 |
JDK-8302451 | 11.0.19 | Goetz Lindenmaier | P3 | Resolved | Fixed | b02 |
JDK-8303876 | 8u381 | Ivan Bereziuk | P3 | Resolved | Fixed | b01 |
Description
(original report) ========================
We've been seeing intermittent SIGBUS failures on linux with jdk11. They
all have this distinctive backtrace:
C [libc.so.6+0x12944d]
V [libjvm.so+0xcca542] perfMemory_init()+0x72
V [libjvm.so+0x8a3242] vm_init_globals()+0x22
V [libjvm.so+0xedc31d] Threads::create_vm(JavaVMInitArgs*, bool*)+0x1ed
V [libjvm.so+0x9615b2] JNI_CreateJavaVM+0x52
C [libjli.so+0x49af] JavaMain+0x8f
C [libjli.so+0x9149] ThreadJavaMain+0x9
Initially, we suspected that /tmp was full but that turned out to not be the case. After a few more instances of the crash and investigation, we believe we know the root cause.
The crashing applications are all running in a K8 pod, with each JVM in a
separate container:
container_type: cgroupv1 (from the hs_err file)
/tmp is mounted such that it's shared by multiple containers. Since these
JVMs are running in containers, we believe what happens is the namespaced (i.e. per container) PIDs overlap between different containers - 2 JVMs, in separate containers, can end up with the same namespaced PID. Since /tmp is shared, they can now "contend" on the same perfMemory file since those file names are PID based.
Once multiple JVMs can contend on the same file, a SIGBUS can arise if one JVM has mmap'd the file and another ftruncate()'s it from under it (e.g.
https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/os/linux/perfMemory_linux.cpp#L909 ).
As for possible solutions, would it be possible to use the global PID instead of the namespaced PID to "regain" the uniqueness invariant of the PID? Also, might it make sense to flock() the file to prevent another process from mucking with it?
(Reported by Vitaly Davidovich --
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2022-April/054921.html )
Manual reproducer:
https://github.com/openjdk/jdk/compare/master...iklam:jdk:8286030-test-case-for-jvm-crash-when-containers-share-tmp-dir?expand=1
Attachments
Issue Links
- backported by
-
JDK-8298573 Avoid JVM crash when containers share the same /tmp dir
- Resolved
-
JDK-8299140 Avoid JVM crash when containers share the same /tmp dir
- Resolved
-
JDK-8302197 Avoid JVM crash when containers share the same /tmp dir
- Resolved
-
JDK-8302451 Avoid JVM crash when containers share the same /tmp dir
- Resolved
-
JDK-8303876 Avoid JVM crash when containers share the same /tmp dir
- Resolved
- relates to
-
JDK-8255008 Serviceability tools don't fully support Containers
- Open
- links to
-
Commit openjdk/jdk11u-dev/c58a0666
-
Commit openjdk/jdk17u-dev/d52e18c9
-
Commit openjdk/jdk/84f23149
-
Review openjdk/jdk11u-dev/1716
-
Review openjdk/jdk17u-dev/1150
-
Review openjdk/jdk/9406