Uploaded image for project: 'Skara'
  1. Skara
  2. SKARA-1552

Log relevant timer numbers

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P2 P2
    • 1.0
    • None
    • bots
    • None

      To get an even better understanding of bot performance, and to be able to really track and verify any improvements, we need to start measuring things and record them in a way where we can easily track it. The metrics gathering we have is nice, but it's missing some crucial features to be really useful. It's currently only supporting "Gauge" and "Counter" as metrics types. These are fairly limited options and are not good for timing measurements. For that you need something like a "Histogram", that tracks statistics for you and export aggregates to the metrics gatherer. However, implementing that seems quite tedious and error prone.

      As a simpler solution, I would like to just log relevant timing measurements in a structured way. By structured, I mean putting the measured numbers in a named field in the LogstashHandler (and not just as part of a string message). Doing this will give us easy access to actual data points in any log analytics tool of choice.

      The things I want to measure are both on a lower level (time for REST calls, or running external commands), as well as specific user experience latencies, like time from /integrate to actual integration, or from "rfr" label posted to mlbridge sending the first email.

      I think I have figured out a reasonable way of implementing this in the current java.util.logging framework. I'm thinking we can (ab)use the Object parameter (which is usually supposed to be used for String formatters). We aren't using that though, so we can instead let the BotLogstashHandler interpret certain types in the parameters array of a LogRecord as something to be added to the log event.

            erikj Erik Joelsson
            erikj Erik Joelsson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: