-
Bug
-
Resolution: Fixed
-
P3
-
0.9
Today we had an outage where the disk that some of the bots were using for data stopped working. This caused everything to grind to a halt blocked on IO, except for the health status endpoint, which happily continued serving 200 results. After a while, the watchdog hit its timeout and called System.exit(1), which made no difference as the JVM process couldn't go down.
I want to change the health status endpoint so that when the watchdog hits, it also flips the health status to unhealthy. This will make us react faster to this situation next time.
I want to change the health status endpoint so that when the watchdog hits, it also flips the health status to unhealthy. This will make us react faster to this situation next time.