Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8328351

jcmd to enable post-mortem analysis

XMLWordPrintable

    • Icon: JEP JEP
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • core-svc
    • None
    • Kevin Walls
    • Feature
    • Open
    • Implementation
    • serviceability dash dev at openjdk dot org
    • M
    • M

      Summary

      Provide a mechanism to run existing Hotspot Serviceability tools offered by jcmd, on a core file or minidump. The familiar jcmd interface will operate on a revived memory image of the crashed JVM. This reuses existing diagnostic code, with low ongoing maintenance.

      Goals

      Enable existing native HotSpot JVM diagnostics to be used after a crash: enable jcmd on a crash dump (e.g. core file or minidump).

      Avoid diagnostic code needing to duplicate JVM internals, to avoid maintenance costs.

      This change is intended to be a foundation on which further enhancements can build.

      Non-Goals

      Do not attempt to replace all existing alternatives, such as the Serviceabilty Agent (SA), in one integration. The technique can be a platform for further tools over time.

      Future work may include an interactive mode which permits multiple diagnostic commands to run in one session.

      Running Java code in the revived JVM is not a goal. Some jcmd commands are implemented in Java, and these will not be be usable with this technique. A jcmd command which is useful after a crash, and implemented in Java, may need to be the subject of further work.

      A new Java or native API for tool building is not proposed here. jcmd is the interface.

      Motivation

      The support and maintenance of the JVM requires tools to investigate problems, on live processes and after a crash.

      The JDK/bin/jcmd tool provides live JVM inspection, with a variety of commands whose names indicate a namespace of JVM subsystems (e.g. VM.info, Compiler.codecache, GC.heap_info, Thread.print etc...).

      Investigating JVM crashes (post-mortem analysis) requires different tools. Native debuggers expose the raw details, but have no insight into the Java context, such as the Java heap and Java code. This JVM-specific data can be decoded, but scripting this in a native debugger is laborious.

      The Serviceability Agent (SA) provides Java-level insight into the JVM. The SA attaches to a live process or opens a crash dump. It decodes JVM information by having built-in knowledge: Java classes model the JVM components, and the JVM explicitly chooses to expose certain structures.

      Tools such as debugger scripts or the SA require continual maintenance as the JVM evolves, and major work to support new feures. This difficult maintenance work is the cause of friction and lag. Scripts can be "brittle" and fail after even subtle JVM changes. In the SA, JVM changes need updates in the native code which exposes data, or the Java code that models the JVM. These changes do not always happen as features change, leaving a gap in support which may be fixed later. Some JVM changes, such as a new GC, require very significant updates to the SA. In the case of ZGC they may not yet be implemented, if they ever will be fully implemented. Lacking support for a GC impacts the many operations that navigate the Java heap. Some consequences may be wider than expected, such as Thread dumping may fail when Thread names cannot be resolved in the Java heap.

      The result can be that the information available using the SA depends on which VM features are in use, meaning not all failures can be investigated equally.

      Additionally, the separation of tools between live and post-mortem use is a complication for users. It is appealing to enable execution of the same tools for post-mortem analysis (using a core file or minidump), as used live by jcmd. This removes the duplication of development effort caused by having separate runtime and debug-time code, and moves towards symmetrical experiences for live and post-mortem JVM analysis.

      This can be achieved by recreating the process memory image using data from the core file, and code from the JVM binary, to provide a process memory image in which the diagnostic commands which jcmd invokes, can be run.

      Note that some SA features are missing from the current jcmd feature set, and these omissions may be covered in separate enhancements.

      Description

      Given a crash dump (core or minidump) from a JVM process, we will enable jcmd to run diagnostics on that crash dump. The same jcmd launcher in the JDK bin directory can be used for crash dumps, with a few additional command-line options.

      The jcmd usage description becomes:

      jcmd [-c] [-L DIRECTORY ] [pid | main-class | crash-dump-name ] command... | PerfCounter.print | -f filename

      e.g. jcmd core.1234 Thread.print

      The new -c option is to avoid confusion with the existing main-class argument, and force the reading of a core file should the core file name match a live Java program name.

      The -L option will be relevant when dealing with transported core files, to specify a location to find a copy of the original JVM.

      The jcmd launcher will invoke a native helper program, into which the memory of the crashed process is revived, and the diagnostic command is executed. This new subprocess is needed to give the revived process its own process space (which could conflict with the launcher's JVM).

      The new native helper process populates its memory space from the data in the core or minidump. An implementation detail is the platform-dependent analysis required to locate the memory mappings to revive.

      It loads the JVM (the native JVM library, e.g. libjvm.so) using normal methods (e.g. dlopen on Linux, LoadLibrary on Windows). This must be at the same virtual address as in the crashed process, which may require copying and making changes to the library.

      The restored memory mappings include data local to the JVM and global data, such as the Java heap. The memory representing native thread stacks is restored, so references into them will resolve. There does not need to be any reconstruction of the threads as the native OS libraries knew them, as these threads are not going to execute.

      The JVM is not "live" as it was at runtime, but its code can be called, and will "see" the correct memory image: absolute pointers are satisfied by memory mapped in from the crash dump, as are memory references relative to the running code.

      The JVM data includes some stored references into OS libraries, which are specific to the previous process memory layout. These must be reset, as they will be at a different address in the new process. A small helper routine built in to the JVM can reset this state, and set any JVM flags or state necessary to permit diagnostic code to run. This helper method is built-in to the JVM and exposed as a public, global symbol so it can be resolved from outside the JVM.

      After mapping memory and invoking the JVM helper function, the helper tool which jcmd launched can make a call into the JVM diagnostic command framework, to run the requested jcmd operation. DCmd::parse_and_execute is the JVM entry point required, which is available as a global symbol.

      This revival technique must not require loading every native library from the crashed process. This is to enable running diagnostics when the core file is transported to a different machine, where the same libraries are not available. These transported cores are traditionally tricky to set up in the debugger, often requiring native libraries to match the source machine. Here the concerns are reduced to the JVM itself and the data.

      There is no new security impact. Using the new feature requires access to the crash dump from a JVM process, which can contain private information, but this is no different to existing debugging efforts. The additional helper method built-in to the JVM is of no value to an attacker. It will never be intentionally called by the JVM during normal operation, and would offer no security risk if an attacker forced it to be called.

      Alternatives

      Simply making more effort on maintenance is always a possibility. The SA will work with all JVM features if enough time is spent on it. The SA and other alternative proposals over the years have always had duplication of effort somewhere.

      Native debug information goes some way to providing low-level HotSpot diagnostics, and will remain an essential part of debugging. But extracting an Object from the Java heap in human-readable form, still requires manual effort or scripting. As soon as such a script is written, ongoing maintenance is required.

      Risks and Assumptions

      At post-mortem debug time, a copy of the exact JVM library in use at the time of the crash is required. This is worth noting, but is basically the same as all such debugging.

      The basic feature set of diagnostic tools usable in the reanimated process is set at build time. While this could be a limitation, the core set of tools is well established. Additional tools that act on the revived data can still be created for specific requirements.

      Dependencies

      The ability to load the native JVM library at a virtual address matching the core file. This is currently achieved by relocating a copy of the library to need that preferred address. The relocation can be performed by existing tools, or implemented directly. Runtime loader cooperation is required, to honour the requested address.

      Additional jcmd tooling will be required to offer diagnostic features comparable to the Serviceability Agent. Two examples are inspecting arbitrary Java objects (see JDK-8318026) and extracting a Java Class ("class dumping").

            kevinw Kevin Walls
            kevinw Kevin Walls
            Kevin Walls Kevin Walls
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: