Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8288298

Resolve multiline message parsing ambiguities in UL

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P4 P4
    • 24
    • 20
    • hotspot
    • b25
    • generic
    • generic

      Tools such as JITWatch parse OpenJDK logs (e.g. logs generated from LogCompilation) in order to present interesting data to users. In order for these tools to work reliably, UnifiedLogging (UL) needs to have a consistent output scheme.

      Currently, a UL log message can be formatted in such a way that it looks like a UL decorator prefix, causing issues with parsing. This is because logging functions in UL do not prepend decorators (or any kind of prefix) to newlines in a log message. For example, log_info(gc)("A\nB"); currently outputs

      [0s][gc] A
      B

      and we could mistakenly interpret B as a decorator. Additionally, developers may introduce pseudo-decorators (something that looks like a decorator but is actually part of the log message), yielding an incorrect parse. As a side remark, but also relevant, we hinder human readability when logs appear suddenly skewed at some points.

      The UL framework should offer a robust way to (a) distinguish decorators from messages, and (b) unambiguously group multiline output. This ticket aims to achieve both goals through a subset of the changes proposed in this mail, from which the rest of the ideas can be done as future work. This means:

      - Decorators cannot contain the symbols "[" or "]".

      - The special decorator "[ ]" (with a variable number of whitespace between the brackets) is reserved.

      - We separate decorators from log messages via the first space after a closing bracket in the line.

      - A log message can contain any kind of symbols, and ends with a newline (except in the case for multiline messages).

      - We prepend multiline logging (such as in the example above) with the invalid decorator "[ ]". The invalid decorator is as wide as the indentation of the rest of the log for easy visual reading. For example:

      [0s][gc] single-line message
      [1s][gc] another single line message
      [2s][gc] first line of a multiline message
      [ ] second and last line of a multiline message
      [3s][gc] another single line message

      Note how this is both unambiguously parseable and human readable.

      For the case where decorators have been disabled, the aforementioned points do not apply (i.e., behaviour remains the same as before). This means no multiline logical connection (such as the one presented above) and no way to separate decorators (the empty set, in this case) from messages. This is intentional as users specifying no decorators expect "raw" output. Additionally, it is to be assumed that the end user is in control of the log command (and effectively disabling decorators means that they are aware of no multiline grouping and no unambiguous parsing solution).

      If the option -Xlog:foldmultilines (already present in UL) is specified, we do not carry out any multiline grouping.

            aseoane Anton Seoane Ampudia
            jsjolen Johan Sjölen
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: