Uploaded image for project: 'Skara'
  1. Skara
  2. SKARA-1827

Better handling of Error

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 1.0
    • None
    • bots
    • None

      When a WorkItem fails with Error (as was recently experienced in SKARA-1825), the item is never removed from the active set in the BotRunner. This causes us to eventually log/ping admins about WorkItems running for too long. We also never try to log the Error, so we aren't notified about the actual problem.

      I think we should add a try catch of Error/Throwable at the top level of RunnableWorkItem::run where we log and re-throw the Error.

      Handling removal from the active set can be trickier, but we should at least attempt it. The problem is that we have to synchronize access to that collection. Maybe it's enough to just log the Error. Leaving the WorkItem in the active set prevents us from trying to run it again, which may be a good thing if running it is triggering Errors, but it also prevents the bot runner from trying to fully recover. In the current synchronized block we are also attempting to schedule pending tasks. It's probably not a good idea to try to do more work than absolutely necessary in a thread that has just thrown an Error.

            erikj Erik Joelsson
            erikj Erik Joelsson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: