Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: 0.9
Affects Version/s: 0.9
Component/s: bots
Labels:
None

When mlbridge is restarted after having the scratch dirs cleared out, it hits a restart loop with the Watchdog. Today I have observed a 20 minute restart cycle, but the cycle is getting longer and longer after a while, so we will eventually reach a steady state.

The Watchdog is hard coded to restart the java process if it hasn't been pinged within 10 minutes since last time. After ~~SKARA-1012~~, cloning repos takes longer than before, so my guess is that now it takes long enough so that all executors get stuck for more than 10 minutes at the same time.

I think we need to make this timeout configurable so it can be adapted for different bot runner configurations.

links to

Commit openjdk/skara/c1f86979

Review openjdk/skara/1157

Assignee:: Erik Joelsson

Reporter:: Erik Joelsson

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2021-05-13 14:22

Updated:: 2021-05-21 15:28

Resolved:: 2021-05-21 15:28

Details

Description

Attachments

Issue Links

Activity

People

Dates