Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8286030

Avoid JVM crash when containers share the same /tmp dir

    XMLWordPrintable

Details

    • b07

    Backports

      Description

        There are some Kubernetes setups that share the same /tmp directory across multiple containers. Such a scenario is currently not supported by the JDK and crashes may happen.

        (original report) ========================
        We've been seeing intermittent SIGBUS failures on linux with jdk11. They
        all have this distinctive backtrace:

        C [libc.so.6+0x12944d]
        V [libjvm.so+0xcca542] perfMemory_init()+0x72
        V [libjvm.so+0x8a3242] vm_init_globals()+0x22
        V [libjvm.so+0xedc31d] Threads::create_vm(JavaVMInitArgs*, bool*)+0x1ed
        V [libjvm.so+0x9615b2] JNI_CreateJavaVM+0x52
        C [libjli.so+0x49af] JavaMain+0x8f
        C [libjli.so+0x9149] ThreadJavaMain+0x9

        Initially, we suspected that /tmp was full but that turned out to not be the case. After a few more instances of the crash and investigation, we believe we know the root cause.

        The crashing applications are all running in a K8 pod, with each JVM in a
        separate container:

        container_type: cgroupv1 (from the hs_err file)

        /tmp is mounted such that it's shared by multiple containers. Since these
        JVMs are running in containers, we believe what happens is the namespaced (i.e. per container) PIDs overlap between different containers - 2 JVMs, in separate containers, can end up with the same namespaced PID. Since /tmp is shared, they can now "contend" on the same perfMemory file since those file names are PID based.

        Once multiple JVMs can contend on the same file, a SIGBUS can arise if one JVM has mmap'd the file and another ftruncate()'s it from under it (e.g.
        https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/os/linux/perfMemory_linux.cpp#L909 ).

        As for possible solutions, would it be possible to use the global PID instead of the namespaced PID to "regain" the uniqueness invariant of the PID? Also, might it make sense to flock() the file to prevent another process from mucking with it?

        (Reported by Vitaly Davidovich --
        https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2022-April/054921.html )

        Manual reproducer:
        https://github.com/openjdk/jdk/compare/master...iklam:jdk:8286030-test-case-for-jvm-crash-when-containers-share-tmp-dir?expand=1

        Attachments

          Issue Links

            Activity

              People

                iklam Ioi Lam
                iklam Ioi Lam
                Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: