[lworld] provide targeted diagnostic for auto boxing that leaks to the heap despite optimization

XMLWordPrintable

      Recent experiments using Valhalla at scale (converting Panama cursor types to value classes) suggest that the performance can be "slippery" or non-robust, that small changes in source code can cause the JVM to move value objects from the stack (scalarized form) to the heap (buffer object form), with significant slowdowns as a result.

      A key root cause of this "slippery" performance is that transitions between heap and stack are not expressed directly in the source code or even in the class file (as bytecodes). Before Java 5, users always knew when heap buffering happens, because there was a use of `new Integer` or `Integer.valueOf` or the like, and such uses could be readily traced in source code or by an appropriate performance monitor tool, and adjusted by hand in source code. Even after Java 5, with auto boxing, the bytecodes in the class file gave some sort of indication where heap buffering might occur, and tools could at least indicate their existence, if they became performance bottlenecks.

      With Valhalla the buffering decision is not visible in the bytecodes and varies dynamically, according to detailed optimization decisions made by the JIT. We need a way to make these decisions more visible and tool-friendly.

      We can make some progress with such problems by having the user produce microbenchmarks and hand them to JIT engineers, or teach the user to create and interpret assembly dumps. (JMH is a bonus here.) But this is expensive, and as we deploy Valhalla more widely we will need turn-key options to give to our Valhalla adopters.

      Hence this RFE. Specifically, we need appropriate events and counters for heap buffering decisions.

      But is not just an allocation counter or rate meter. It needs to suggest where (in the source code) the root problems might be. The undesirable buffering events need to be linked back to bytecodes and source code lines, even though they are not directly caused by single bytecodes.

      The events should also be classified (at least optionally) as to where they came from, the interpreter, C1, or C2. We expect the interpreter to buffer routinely so that is not a concern, but in many cases we will be surprised and concerned if fully optimized code (C2) continues to buffer a value object.

      In addition, somebody who understands how profilers gather information (e.g., the IntelliJ IDEA profiler's "JVM hooks") needs to ensure that the new information about buffering events is presented in a form which the existing tools can take in, of course as they are adjusted appropriately.

      Besides allowing us to tune individual use cases, these tools will also allow us to assess the overall priority of optimizations which will more aggressively scalarize "obscured" value objects, such as those which are passed under loose types like `Object` and interfaces, as well as under their nullable reference types (when they are supposed to be "bucket three" bare primitives).

            Assignee:
            Unassigned
            Reporter:
            John Rose
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: