We're investigating task timeouts in Mach5 via:
MACH5-5002 getting a number of "timeouts in execution" for test tasks on macOS again
MACH5-5014 Post execution timeout on Linux aarch64
I've filed this JBS issue to coordinate the investigation into whether
specific failed tests are causing the Mach5 task timeouts.
For the macosx-x64 task timeouts, there are a couple of related
bugs that are tracking some failure modes:
JDK-8267433 Core dumps on OSX sometimes take a very long time
JDK-8265037 serviceability/sa/ClhsdbPmap.java#id1 failed with "RuntimeException: Process is still alive. Can't get its output."
The current operational theory for the macosx-x64 failures is that
tests that core dump can somehow mess up the test machine
in a way that causes slow test execution in subsequent tests that
core dump or even in other tests in general. We have even seen
kernel panics on the macosx-x64 machines when we reboot them
after the machine has gotten slow. Here's an example:
panic(cpu 2 caller 0xffffff80207fe9fe): watchdog timeout: no checkins from watchdogd in 301 seconds (216979 totalcheckins since monitoring last enabled), shutdown in progress
MACH5-5002 getting a number of "timeouts in execution" for test tasks on macOS again
MACH5-5014 Post execution timeout on Linux aarch64
I've filed this JBS issue to coordinate the investigation into whether
specific failed tests are causing the Mach5 task timeouts.
For the macosx-x64 task timeouts, there are a couple of related
bugs that are tracking some failure modes:
JDK-8267433 Core dumps on OSX sometimes take a very long time
JDK-8265037 serviceability/sa/ClhsdbPmap.java#id1 failed with "RuntimeException: Process is still alive. Can't get its output."
The current operational theory for the macosx-x64 failures is that
tests that core dump can somehow mess up the test machine
in a way that causes slow test execution in subsequent tests that
core dump or even in other tests in general. We have even seen
kernel panics on the macosx-x64 machines when we reboot them
after the machine has gotten slow. Here's an example:
panic(cpu 2 caller 0xffffff80207fe9fe): watchdog timeout: no checkins from watchdogd in 301 seconds (216979 totalcheckins since monitoring last enabled), shutdown in progress
- relates to
-
JDK-8265037 serviceability/sa/ClhsdbPmap.java#id1 failed with "RuntimeException: Process is still alive. Can't get its output."
- Open
-
JDK-8267433 Core dumps on OSX sometimes take a very long time
- Open