Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Duplicate
Priority: P2
Fix Version/s: 25
Affects Version/s: None
Component/s: hotspot
Labels:
- datadog-interest

Subcomponent:
jfr

Sampling stacks from safepoints is a very safe way of sampling. All components of the JVM have been designed to be able to walk stacks from safepoints, and by walking only frames that are walkable, we are immune to trouble caused by guessed methods that quack like methods from a genuine stack trace and walk like methods from a genuine stack trace, but potentially explode later on due to use-after-free.

The classic problem with safepoint based sampling, is safepoint bias. Essentially, the trouble is that samples come from points where we poll from safepoints, which might be many bytecodes away from where we were really spending statistically significant time.

I propose a hybrid safepoint and signal based solution. The idea is that we still shoot a signal at a samplee thread. In that signal, we only record the SP and PC of the thread. Then this is enqueued to be sampled on that thread, in a subsequent safepoint pollsite. When we get to the subsequent safepoint pollsite, we check if the PC is from an nmethod. If it is, we can recreate the exact stacktrace that we would normally have reported from the signal handler, from the safepoint pollsite instead. When we hit compiled methods, we get the benefits of signal based accuracy, combined with the fundamental safety of having the entire stacktrace be walked from a safe walkable point in the JVM. When the sampeld PC isn't coming from an nmethod, I propose we perform the stack trace completely from the safe point. As for any safepoint bias from the interpreter, it's rather straight forward to simply poll for safepoints in the dispatch loop of the interpreter, which eliminates the safepoint bias as a problem, from interpreted code. The original proposed patch for thread-local handshakes did exactly that and it worked absolutely fine.

I have a prototype for the suggested changes available here: https://github.com/fisk/jdk/tree/jfr_safe_trace_v1

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Cooperative_JFR_Sampling_draft_06.pdf
2024-10-25 02:36
457 kB
Markus Grönlund

blocks

JDK-8302350 JfrThreadSampler failed with "assert((is_native() && bci == 0) || (!is_native() && 0 <= bci && bci < code_size())) failed: illegal bci: 0 for non-native method"

Closed

duplicates

JDK-8352251 Implement JEP 518: JFR Cooperative Sampling

Resolved

relates to

JDK-8168445 make pd_get_top_frame_for_profiling more robust

Open

JDK-8170152 WhiteBox testing of pd_get_top_frame_for_profiling

Open

JDK-8350338 JEP 518: JFR Cooperative Sampling

Closed

JDK-8326236 assert(ce != nullptr) failed in Continuation::continuation_bottom_sender

Resolved

(1 relates to)

Assignee:: Markus Grönlund

Reporter:: Erik Österlund

Votes:: 1 Vote for this issue

Watchers:: 14 Start watching this issue

Created:: 2023-11-30 01:51

Updated:: 2025-05-30 05:37

Resolved:: 2025-04-24 02:54

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates