Summary
The VFORK launch mechanism is very dangerous. It can lead to spurious, very hard-to-diagnose errors in the parent process JVM. It should therefore be removed.
Problem
On Linux, we historically supported three different ways to spawn off a child process via Runtime.exec() and friends:
A) fork
+ exec
(FORK)
B) vfork
+ exec
(VFORK)
C) posix_spawn
+ jspawnhelper
+ additional exec (POSIX_SPAWN)
These mechanisms can be chosen by the customer via -Djdk.lang.Process.launchMechanism=<mode>
.
Between fork
/vfork
/posix_spawn
and exec
the forked child process runs preparatory code (closing file descriptors etc).
The VFORK mode (B) is dangerous. In the time window between vfork
and exec
the child process runs on the memory image of the parent process. It may accidentally damage or kill the parent process. This can happen in many ways, for example by programming error (very easily done, since almost nobody is aware of this danger) but also by things that are outside the control of the programmer, e.g. certain asynchronous signals. See this mail from 2018 [1] describing some real-world cases.
An additional problem is that these errors will look like random crashes or even just sudden deaths of the parent JVM (no hs-err file). So they will mostly not be attributed to vfork
. Due to a lack of information, these issues are likely very underreported, so they could be more prevalent than we know.
For these reasons we decided to support posix_spawn
on Linux [2] and use it by default [3]; both changes happened with JDK 13.
But this still leaves the dangerous VFORK mode around; it can be manually enabled with -Djdk.lang.Process.launchMechanism=VFORK
.
Solution
We propose to remove the VFORK mechanism. If a user specifies -Djdk.lang.Process.launchMechanism=VFORK
, we should write a warning message and recommend either removing this switch or using FORK if the user has problems with posix_spawn.
Compatibility Risk
The risk is low. JDK 13 was released in 2019 and we had two LTS releases (17 and 21) using posix_spawn as the default.
1) Risk: Bugs in libc
I scanned GitHub for uses of -Djdk.lang.Process.launchMechanism=VFORK
. Not many hits were found, most of them seem copy-pasted from the same origin: some issue reported on Alpine [4]. None of the cases I found analyzed the root problem; everyone was just content to switch to vfork
.
I tried to reproduce this problem, but the default posix_spawn on Alpine works for me just fine. On Alpine, muslc implements posix_spawn with clone( ... CLONE_VM | CLONE_VFORK ...)
, which is safe and fast. I find it therefore very likely that this was no problem with either muslc nor with the JVM but some customer-local problem, e.g. wrong permissions on the jspawnhelper binary.
Nevertheless, should a customer encounter problems with posix_spawn, the solution laid out above suffices. We recommend using fork
instead. On modern kernels with today's copy-on-write mechanisms the difference between fork and vfork
should be minimal.
Alternative: We also could ease the transition by aliasing vfork
to fork; existing installations passing in -Djdk.lang.Process.launchMechanism=VFORK
would then use fork
instead. The disadvantage here is that fork
may be less effective than posix_spawn
, and we leave outdated options around in installations for customers to copy from.
Note that I also plan to do [5], which hopefully will simplify analyzing Runtime.exec problems.
2) Risk: glibc version
Just mentioning this for completeness. Not all glibc versions support posix_spawn well. glibc versions before 2.4 (released 2006) have no good support for it. See details of this in [6]
I think this risk is theoretical only since this glibc version is 19 years old, and all modern distributions ship with far more modern glibc.
Specification
tbd
Links
- [1] https://mail.openjdk.org/pipermail/core-libs-dev/2018-September/055333.html
- [2] https://bugs.openjdk.org/browse/JDK-8218805
- [3] https://bugs.openjdk.org/browse/JDK-8213192
- [4] https://stackoverflow.com/questions/61301818/failed-to-exec-spawn-helper-error-since-moving-to-java-14-on-linux
- [5] https://bugs.openjdk.org/browse/JDK-8357100
- [6] https://bugs.openjdk.org/browse/JDK-8213192?focusedId=14222261&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14222261
- csr of
-
JDK-8357089 Remove VFORK launch mechanism from Process implementation (linux)
-
- In Progress
-