Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8357090

Remove VFORK launch mechanism from Process implementation (linux)

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Unresolved
    • Icon: P4 P4
    • 26
    • core-libs
    • None
    • behavioral
    • low
    • Hide
      Low. See Description.
      Show
      Low. See Description.
    • Other

      Summary

      The VFORK launch mechanism is very dangerous. It can lead to spurious, very hard-to-diagnose errors in the parent process JVM. It should therefore be removed.

      Problem

      On Linux, we historically supported three different ways to spawn off a child process via Runtime.exec() and friends:

      A) fork + exec (FORK) B) vfork + exec (VFORK) C) posix_spawn + jspawnhelper + additional exec (POSIX_SPAWN)

      These mechanisms can be chosen by the customer via -Djdk.lang.Process.launchMechanism=<mode>.

      Between fork/vfork/posix_spawn and exec the forked child process runs preparatory code (closing file descriptors etc).

      The VFORK mode (B) is dangerous. In the time window between vfork and exec the child process runs on the memory image of the parent process. It may accidentally damage or kill the parent process. This can happen in many ways, for example by programming error (very easily done, since almost nobody is aware of this danger) but also by things that are outside the control of the programmer, e.g. certain asynchronous signals. See this mail from 2018 [1] describing some real-world cases.

      An additional problem is that these errors will look like random crashes or even just sudden deaths of the parent JVM (no hs-err file). So they will mostly not be attributed to vfork. Due to a lack of information, these issues are likely very underreported, so they could be more prevalent than we know.

      For these reasons we decided to support posix_spawn on Linux [2] and use it by default [3]; both changes happened with JDK 13.

      But this still leaves the dangerous VFORK mode around; it can be manually enabled with -Djdk.lang.Process.launchMechanism=VFORK.

      Solution

      We propose to remove the VFORK mechanism. If a user specifies -Djdk.lang.Process.launchMechanism=VFORK, we should write a warning message and recommend either removing this switch or using FORK if the user has problems with posix_spawn.

      Compatibility Risk

      The risk is low. JDK 13 was released in 2019 and we had two LTS releases (17 and 21) using posix_spawn as the default.

      1) Risk: Bugs in libc

      I scanned GitHub for uses of -Djdk.lang.Process.launchMechanism=VFORK. Not many hits were found, most of them seem copy-pasted from the same origin: some issue reported on Alpine [4]. None of the cases I found analyzed the root problem; everyone was just content to switch to vfork.

      I tried to reproduce this problem, but the default posix_spawn on Alpine works for me just fine. On Alpine, muslc implements posix_spawn with clone( ... CLONE_VM | CLONE_VFORK ...), which is safe and fast. I find it therefore very likely that this was no problem with either muslc nor with the JVM but some customer-local problem, e.g. wrong permissions on the jspawnhelper binary.

      Nevertheless, should a customer encounter problems with posix_spawn, the solution laid out above suffices. We recommend using fork instead. On modern kernels with today's copy-on-write mechanisms the difference between fork and vfork should be minimal.

      Alternative: We also could ease the transition by aliasing vfork to fork; existing installations passing in -Djdk.lang.Process.launchMechanism=VFORK would then use fork instead. The disadvantage here is that fork may be less effective than posix_spawn, and we leave outdated options around in installations for customers to copy from.

      Note that I also plan to do [5], which hopefully will simplify analyzing Runtime.exec problems.

      2) Risk: glibc version

      Just mentioning this for completeness. Not all glibc versions support posix_spawn well. glibc versions before 2.4 (released 2006) have no good support for it. See details of this in [6]

      I think this risk is theoretical only since this glibc version is 19 years old, and all modern distributions ship with far more modern glibc.

      Specification

      tbd

      Links

            stuefe Thomas Stuefe
            stuefe Thomas Stuefe
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: