-
Enhancement
-
Resolution: Unresolved
-
P2
-
None
-
Fix Understood
The classic problem with safepoint based sampling, is safepoint bias. Essentially, the trouble is that samples come from points where we poll from safepoints, which might be many bytecodes away from where we were really spending statistically significant time.
I propose a hybrid safepoint and signal based solution. The idea is that we still shoot a signal at a samplee thread. In that signal, we only record the SP and PC of the thread. Then this is enqueued to be sampled on that thread, in a subsequent safepoint pollsite. When we get to the subsequent safepoint pollsite, we check if the PC is from an nmethod. If it is, we can recreate the exact stacktrace that we would normally have reported from the signal handler, from the safepoint pollsite instead. When we hit compiled methods, we get the benefits of signal based accuracy, combined with the fundamental safety of having the entire stacktrace be walked from a safe walkable point in the JVM. When the sampeld PC isn't coming from an nmethod, I propose we perform the stack trace completely from the safe point. As for any safepoint bias from the interpreter, it's rather straight forward to simply poll for safepoints in the dispatch loop of the interpreter, which eliminates the safepoint bias as a problem, from interpreted code. The original proposed patch for thread-local handshakes did exactly that and it worked absolutely fine.
I have a prototype for the suggested changes available here: https://github.com/fisk/jdk/tree/jfr_safe_trace_v1
- blocks
-
JDK-8316239 JFR: fatal error: refcount has gone to zero
-
- Open
-
-
JDK-8321822 ZGC: SIGSEGV in "JFR Recorder Thread"
-
- Open
-
-
JDK-8343003 assert(lower->pc_offset() < pc_offset) failed: sanity
-
- Open
-
-
JDK-8302350 JfrThreadSampler failed with "assert((is_native() && bci == 0) || (!is_native() && 0 <= bci && bci < code_size())) failed: illegal bci: 0 for non-native method"
-
- Open
-
- relates to
-
JDK-8168445 make pd_get_top_frame_for_profiling more robust
-
- Open
-
-
JDK-8170152 WhiteBox testing of pd_get_top_frame_for_profiling
-
- Open
-
-
JDK-8350338 Cooperative JFR Sampling
-
- Submitted
-
-
JDK-8352251 Implement Cooperative JFR Sampling
-
- In Progress
-
-
JDK-8326236 assert(ce != nullptr) failed in Continuation::continuation_bottom_sender
-
- Resolved
-