[See Comments for simpler ways of reproducing this bug.]
The regression test
JDK1.4/test/java/rmi/activation/ActivationSystem/modifyDescriptor
is failing with some frequency on Solaris, leaving around an orphaned JVM that seems to be running but does not respond to signals; it can only be killed with kill -9.
The test starts up a child JVM process running the RMI activation daemon (rmid).
The test then creates an activation group, and an activatable object in that group, and makes an RMI call to the object, causing a JVM process to be created as a child of the rmid process. The test then makes an RMI call to rmid, asking rmid to shut down. This causes rmid to call Process.destroy on its child, and then rmid exits. (rmid does not currently call Process.waitFor on its child.) The test then starts up a new child JVM process again running rmid, adopting the log file from the previous rmid. The test then makes another RMI call to the activatable object, expecting the call to cause a new activation group to be created as a new child JVM process of the new rmid. However, sometimes what happens is that the RMI call is received by the old activation group process, which causes the test to fail. In other words, Process.destroy did not cause the first activation group JVM to terminate, and that JVM was still functional enough afterwards to receive and execute an incoming RMI call, although that JVM is in a peculiar signal state that apparently causes signals other than SIGKILL to be ignored.
We've only seen this on Solaris, not on Windows. We've seen it on both 5.6 and 5.8. We've seen it with both b62 and b63. The problem seems to have appeared only recently. The probability of it happening seems to vary.
Information extracted from an example orphaned process is attached.
The regression test
JDK1.4/test/java/rmi/activation/ActivationSystem/modifyDescriptor
is failing with some frequency on Solaris, leaving around an orphaned JVM that seems to be running but does not respond to signals; it can only be killed with kill -9.
The test starts up a child JVM process running the RMI activation daemon (rmid).
The test then creates an activation group, and an activatable object in that group, and makes an RMI call to the object, causing a JVM process to be created as a child of the rmid process. The test then makes an RMI call to rmid, asking rmid to shut down. This causes rmid to call Process.destroy on its child, and then rmid exits. (rmid does not currently call Process.waitFor on its child.) The test then starts up a new child JVM process again running rmid, adopting the log file from the previous rmid. The test then makes another RMI call to the activatable object, expecting the call to cause a new activation group to be created as a new child JVM process of the new rmid. However, sometimes what happens is that the RMI call is received by the old activation group process, which causes the test to fail. In other words, Process.destroy did not cause the first activation group JVM to terminate, and that JVM was still functional enough afterwards to receive and execute an incoming RMI call, although that JVM is in a peculiar signal state that apparently causes signals other than SIGKILL to be ignored.
We've only seen this on Solaris, not on Windows. We've seen it on both 5.6 and 5.8. We've seen it with both b62 and b63. The problem seems to have appeared only recently. The probability of it happening seems to vary.
Information extracted from an example orphaned process is attached.