A customer reported a process-hanging issue. I think this is a problem with jspawnhelper.
Process A (a JVM) spawns, via ProcessBuilder.start, child process B. In order to do that, it will create a fail pipe to listen to error messages from the child. Child B:
- closes read end of pipe
- sets write end of pipe to O_CLOEXEC
- execve() the target binary.
Parent A:
- closes write end of pipe
- waits on read end.
If child execve succeeds, it will automatically close the write end of the pipe in the child B, the parent gets an EOF and knows the child succeeded in doing execve.
If child B execve fails, it sends an error message to the parent via the still open write end of the pipe.
However, if between the parent creating a pipe and the parent closing the write end of the pipe some native thread in the parent forks off via a native - not controlled by us - fork() call, that new child process C now also carries a copy of the write end of the pipe. The fail pipe will stay open as long as the second child process C did not end. That, in turn, causes the parent process to hang in forkAndExec() waiting for the fail pipe to go away.
This is a bog-standard UNIX programming error.
One possible remedy is to use the pipe2() call, that has been created to deal exactly with this sort of problem. pipe2() creates a pipe whose file descriptors are set to O_CLOEXEC from the start.
The problem though is:
- in POSIX_FORK mode, we execve() *twice* - once to load the jspawnhelper, a second time to load the target binary. The fail pipe needs to survive the first execve(), and only close at the second execve().
- pipe2() does not exist for MacOS nor for AIX.
---
For `-Djdk.lang.Process.launchMechanism=FORK`, we can provide a complete fix for Linux and xxxBSD, and at least make the error much less likely on MacOS and AIX. The former would use pipe2(), the latter pipe() as now, but tag the file descriptors as CLOEXEC right away after the creation of the pipe.
All of this is easy, so I will create a sub-issue for this part-wise solution and post a patch shortly.
For `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`, this is much more difficult.
Process A (a JVM) spawns, via ProcessBuilder.start, child process B. In order to do that, it will create a fail pipe to listen to error messages from the child. Child B:
- closes read end of pipe
- sets write end of pipe to O_CLOEXEC
- execve() the target binary.
Parent A:
- closes write end of pipe
- waits on read end.
If child execve succeeds, it will automatically close the write end of the pipe in the child B, the parent gets an EOF and knows the child succeeded in doing execve.
If child B execve fails, it sends an error message to the parent via the still open write end of the pipe.
However, if between the parent creating a pipe and the parent closing the write end of the pipe some native thread in the parent forks off via a native - not controlled by us - fork() call, that new child process C now also carries a copy of the write end of the pipe. The fail pipe will stay open as long as the second child process C did not end. That, in turn, causes the parent process to hang in forkAndExec() waiting for the fail pipe to go away.
This is a bog-standard UNIX programming error.
One possible remedy is to use the pipe2() call, that has been created to deal exactly with this sort of problem. pipe2() creates a pipe whose file descriptors are set to O_CLOEXEC from the start.
The problem though is:
- in POSIX_FORK mode, we execve() *twice* - once to load the jspawnhelper, a second time to load the target binary. The fail pipe needs to survive the first execve(), and only close at the second execve().
- pipe2() does not exist for MacOS nor for AIX.
---
For `-Djdk.lang.Process.launchMechanism=FORK`, we can provide a complete fix for Linux and xxxBSD, and at least make the error much less likely on MacOS and AIX. The former would use pipe2(), the latter pipe() as now, but tag the file descriptors as CLOEXEC right away after the creation of the pipe.
All of this is easy, so I will create a sub-issue for this part-wise solution and post a patch shortly.
For `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`, this is much more difficult.