Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8271232

JFR: Scrub recording data

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P3 P3
    • 19
    • None
    • hotspot
    • None
    • jfr
    • b12

      A frequent request over the years have been to remove events from a recording file, usually for security reason. For example, the jdk.InitialEnvironmentVariable event may contain passwords or other sensitive information. Another use case is to remove events to reduce the size of the recording. This can be to lower memory consumption when using JMC, to reduce the amount of data to send over the network, or to use less disk space when archiving the file.

      It's quite easy to remove the "shallow" data that makes up an event. Each event in the file starts with the size and by skipping those bytes and adjust pointers to check point and metadata events, a new smaller file can be produced. What makes the implementation tricky is to remove data that is in the constant pools, like methods in stack traces. A method may be referenced from multiple events, or even from other constant pools, and it's only if there is no reference to the method at all, it can safely be removed. Since security is an important use case, we can't cheat and keep sensitive data in the constant pools, even though they would never show up when using the API.

      There is a need to scrub a recording from command-line, as a quick post processing step that user can do before sending it away recording to other parties, and programmatically to do more advanced filtering where the contents of each event can be inspected.

      How powerful should such an API be?

      One could imagine an API that can inspect any event value in the recording and allow users to replace it with their own. It would be complicated to implement (and probably to maintain) since data would need to be recompressed back into the constant pools.

      A simpler approach would be an API that allow users supply a lambda predicate to decide if a RecordedEvent should be included or not. It would only allow events to be removed, not individual data in an event. Such an API could be small, probably only one method. The implementation would need to use a marking mechanism and once a chunk has been processed, remove events and constant pool data that is never touched, somewhat similar to a tracing GC.

      It's likely a dedicated parser would need to be implemented for this. Components of the current parser can be reused, like parsing metadata, but new data structures are needed to keep track of references to constant pools, Today that information is removed once data in constant pools have been resolved.

      Filtering could be of interest when doing remote streaming, to reduce the amount of data to transfer back to the client, but it may not be an important enough use case to carry its weight to add an API, at least not now.

            egahlin Erik Gahlin
            egahlin Erik Gahlin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: