The failure_handler configuration for linux[1] and macos[2] uses
kill -ABRT %p
to dump the core of a timed out jtreg test. This command returns immediatelly and the coredump is initiated in the background by the OS, making it impossible for the failure_handler to properly track the timeout of this action. Let's change to a coredump method which will wait until the coredump is actually finished before returning:
On Linux:
bash -c "kill -ABRT %p && tail --pid=%p -f /dev/null"
On Mac:
bash -c "kill -ABRT %p && lsof -p %p +r 1 &>/dev/null"
(credit: https://stackoverflow.com/a/41613532)
Dumping a core can also take longer than the default action timeout of 20 seconds. Some personal testing showed coredumps for heaps of size 10-20G to take roughly 1-2 minutes. Let's set a safe default of 10 minutes for this action.
[1]:https://github.com/openjdk/jdk/blob/master/test/failure_handler/src/share/conf/linux.properties
[2]:https://github.com/openjdk/jdk/blob/master/test/failure_handler/src/share/conf/mac.properties
kill -ABRT %p
to dump the core of a timed out jtreg test. This command returns immediatelly and the coredump is initiated in the background by the OS, making it impossible for the failure_handler to properly track the timeout of this action. Let's change to a coredump method which will wait until the coredump is actually finished before returning:
On Linux:
bash -c "kill -ABRT %p && tail --pid=%p -f /dev/null"
On Mac:
bash -c "kill -ABRT %p && lsof -p %p +r 1 &>/dev/null"
(credit: https://stackoverflow.com/a/41613532)
Dumping a core can also take longer than the default action timeout of 20 seconds. Some personal testing showed coredumps for heaps of size 10-20G to take roughly 1-2 minutes. Let's set a safe default of 10 minutes for this action.
[1]:https://github.com/openjdk/jdk/blob/master/test/failure_handler/src/share/conf/linux.properties
[2]:https://github.com/openjdk/jdk/blob/master/test/failure_handler/src/share/conf/mac.properties