Details
Description
Summary
Define an efficient and reliable API to collect stack traces asynchronously and include information on both Java and native stack frames.
Goals
Provide a well-tested API for profilers to obtain information on Java and native frames.
Support asynchronous, e.g., calling from signal handlers, and synchronous usage
Do not affect performance when the API is not in use.
Do not significantly increase memory requirements compared to the existing
AsyncGetCallTrace
API.
Motivation
The AsyncGetCallTrace
API is used by almost all available profilers, both open-source and commercial, including, e.g., async-profiler. Yet it has three major disadvantages:
- It is an internal API, not exported in any header, and
- It only returns information about Java frames, namely their method and bytecode indices.
- It cannot be used to walk collect stack traces in a separate thread, outside a signal handler, to implement JFR like sampling.
These issues make implementing profilers and related tooling more difficult. Some additional information can be extracted from the HotSpot VM via complex code, but other useful information is hidden and impossible to obtain:
- Whether a compiled Java frame is inlined (currently only obtainable for the topmost compiled frames),
- The compilation level of a Java frame (i.e., compiled by C1 or C2), and
- Information on C/C++ frames that are not at the top of the stack.
Such data can be helpful when profiling and tuning a VM for a given application, and for profiling code that uses JNI heavily.
Description
We propose a new AsyncGetStackTrace
API, modeled on the AsyncGetCallTrace
API:
void AsyncGetStackTrace(ASGST_CallTrace *trace, jint depth, void* ucontext, uint32_t options);
This API can be called by profilers to obtain the stack trace of a thread, but it does not guarantee to obtain all frames and works on best-effort basis. Its implementation will be at least as stable as AsyncGetCallTrace
or the JFR stack walking code, due to fuzzing and stability tests in the JDK and extensive safety checks in the implementation itself. The VM fills in information about the frames, the number of frames, and the trace kind. The API can be used safely from a separate thread, which is the recommended usage, but can also be used in a signal handler. The used jmethodIDs have to be pre-allocated outside a signal handler using JVM/TI when calling the API itself from a signal handler. You have explicitly tell the API to walk the same thread via the ASGST_WALK_SAME_THREAD
option, this assumes that the passed ucontext comes always from the same thread. The caller of the API should allocate the CallTrace
array with sufficient memory for the requested stack depth. Walked threads are required to be halted during stack walking.
Parameters:
trace
— buffer for structured data to be filled in by the VMdepth
— maximum depth of the call stack traceucontext
—ucontext_t
of the thread where the stack walking should startoptions
— bit set for options
Currently, only the lowest four of the options
are considered, all other bits are considered to be 0
:
enum ASGST_Options {
ASGST_INCLUDE_C_FRAMES = 1, // include C/C++ (this includes Stub frames)
ASGST_INCLUDE_NON_JAVA_THREADS = 2, // walk the stacks of C/C++, GC and deopt threads too
ASGST_WALK_DURING_UNSAFE_STATES = 4, // walk the stack during potentially unsafe thread states (like safepoints)
// walk the stack for the same thread (e.g. in a signal handler),
// disables protections that are only enabled in separate thread mode
ASGST_WALK_SAME_THREAD = 8
};
There are different kinds of traces depending on the purpose of the currently running code in the walked thread:
enum ASGST_TRACE_KIND {
ASGST_JAVA_TRACE = 1,
ASGST_CPP_TRACE = 2,
ASGST_GC_TRACE = 4,
ASGST_DEOPT_TRACE = 8,
ASGST_NEW_THREAD_TRACE = 16,
ASGST_UNKNOWN_TRACE = 32,
};
The trace
struct
typedef struct {
JNIEnv *env_id; // Env where trace was recorded
jint num_frames; // number of frames in this trace,
// (< 0 indicates the frame is not walkable).
uint8_t kind; // kind of the trace, if non zero intialized, it is a bit mask for accepted kinds
jint state; // thread state (jvmti->GetThreadState), if non zero initialized,
// it is a bit mask for accepted states, non Java kind traces are always accepted
// and get state -1
ASGST_CallFrame *frames; // frames that make up this trace. Callee followed by callers.
void* frame_info; // more information on frames
} ASGST_CallTrace;
is filled in by the VM. Its num_frames
field contains the actual number of frames in the frames
array or an error code. The frame_info
field in that structure can later be used to store more information, but is currently nullptr
.
The kind
and state
field serve a dual purpose: They are bitmasks for the allowed kinds and states (same as JVMTI GetThreadState) if non-zero and allow profilers to constrain the kinds of obtained traces and states of walked threads. If the walking is aborted because of a mismatching kind or state, then the error code ASGST_WRONG_KIND
and ASGST_WRONG_STATE
are set. The kind
field only contains valid information if no error except the ASGST_WRONG_KIND
occurred. The kind
field only contains valid information if no error except the ASGST_WRONG_STATE
occurred.
The error codes are a superset of the error codes for AsyncGetCallTrace
, with the addition of THREAD_NOT_JAVA
related to calling this procedure for non-Java threads without using the INCLUDE_NON_JAVA_THREADS
option:
enum ASGST_Error {
ASGST_NO_JAVA_FRAME = 0,
ASGST_NO_CLASS_LOAD = -1,
ASGST_GC_ACTIVE = -2,
ASGST_UNKNOWN_NOT_JAVA = -3,
ASGST_NOT_WALKABLE_NOT_JAVA = -4,
ASGST_UNKNOWN_JAVA = -5,
ASGST_NOT_WALKABLE_JAVA = -6,
ASGST_UNKNOWN_STATE = -7,
ASGST_THREAD_EXIT = -8,
ASGST_DEOPT = -9,
ASGST_THREAD_NOT_JAVA = -10,
ASGST_NO_THREAD = -11,
ASGST_UNSAFE_STATE = -12,
ASGST_WRONG_STATE = -13,
ASGST_WRONG_KIND = -14,
};
Error codes lower than -30 are vendor specific.
Every CallFrame
is the element of a union since the information stored for Java and non-Java frames differs:
typedef union {
uint8_t type; // to distinguish between JavaFrame and NonJavaFrame
ASGST_JavaFrame java_frame;
ASGST_NonJavaFrame non_java_frame;
} ASGST_CallFrame;
There are several distinguishable frame types:
enum ASGST_FrameTypeId {
ASGST_FRAME_JAVA = 1, // JIT compiled and interpreted
ASGST_FRAME_JAVA_INLINED = 2, // inlined JIT compiled
FRAME_JAVA_NATIVE = 3, // barrier frames between Java and C/C++
ASGST_FRAME_CPP = 4 // C/C++/... frames
};
The first two types are for Java frames, for which we store the following information in a struct of type JavaFrame
:
typedef struct {
uint8_t type; // frame type
int8_t comp_level; // compilation level, 0 is interpreted, -1 is undefined, > 1 is JIT compiled
uint16_t bci; // 0 <= bci < 65536, 65535 (= -1) if the bci is >= 65535 or not available (like in native frames)
jmethodID method_id;
} ASGST_JavaFrame; // used for FRAME_JAVA, FRAME_JAVA_INLINED and FRAME_JAVA_NATIVE
The comp_level
indicates the compilation level of the method related to the frame, with higher numbers representing higher levels of compilation. It is modeled after the CompLevel enum in HotSpot but is dependent on the compiler infrastructure used. A value of zero indicates no compilation, i.e., bytecode interpretation.
Information on all other frames is stored in NonJavaFrame
structs:
typedef struct {
uint8_t type; // frame type
void *pc; // current program counter inside this frame, might be a nullptr for JVM internal frames like stub frames, …
} ASGST_NonJavaFrame; // used for FRAME_CPP
Although the API provides more information, the amount of space required per frame (e.g., 16 bytes on x86) is the same as for the existing AsyncGetCallTrace
API.
We propose to place the above declarations in a new header file, profile.h
, which will be placed in the include
directory of the JDK image. The header’s license should include the Classpath Exception so that it is consumable by third-party profiling tools.
The implementation can be found in the jdk-sandbox repository, and a demo combining it with a modified async-profiler can be found here.
Risks and Assumptions
Returning information on C/C++ frames leaks implementation details, but this is also true for the Java frames of AsyncGetCallTrace
since they leak details of the implementation of standard library files and include native wrapper frames.
Testing
The implementation contains several stress and fuzzing tests to identify stability problems on all supported platforms, sampling the renaissance benchmark suite repeatedly with small profiling intervals (<= 0.1ms). The fuzzing tests check that AsyncGetStackTrace can be called with modified stack and frame pointers without crashing the VM. We also added several tests which cover the basic usage of the API.
Attachments
Issue Links
- relates to
-
JDK-8170152 WhiteBox testing of pd_get_top_frame_for_profiling
-
- Open
-