Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P4
Fix Version/s: 1.0
Affects Version/s: None
Component/s: bots
Labels:
None

When a WorkItem fails with Error (as was recently experienced in ~~SKARA-1825~~), the item is never removed from the active set in the BotRunner. This causes us to eventually log/ping admins about WorkItems running for too long. We also never try to log the Error, so we aren't notified about the actual problem.

I think we should add a try catch of Error/Throwable at the top level of RunnableWorkItem::run where we log and re-throw the Error.

Handling removal from the active set can be trickier, but we should at least attempt it. The problem is that we have to synchronize access to that collection. Maybe it's enough to just log the Error. Leaving the WorkItem in the active set prevents us from trying to run it again, which may be a good thing if running it is triggering Errors, but it also prevents the bot runner from trying to fully recover. In the current synchronized block we are also attempting to schedule pending tasks. It's probably not a good idea to try to do more work than absolutely necessary in a thread that has just thrown an Error.

links to

Commit openjdk/skara/eb6c0bd1

Review openjdk/skara/1490

Assignee:: Erik Joelsson

Reporter:: Erik Joelsson

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023-02-17 11:09

Updated:: 2023-06-13 13:25

Resolved:: 2023-06-13 13:25

Details

Description

Attachments

Issue Links

Activity

People

Dates