-
Enhancement
-
Resolution: Fixed
-
P4
-
17, 21, 25
-
b12
We have a production use case where leaving the safepoint introduced a significant latency, mostly due to VMThread getting de-scheduled. There are mitigations for this, see JDK-8350324. This issue is about capturing such an event in safepoint logging more comprehensively.
In those cases we would see something like:
[3.664s][info][gc ] GC(7) Pause Young (Allocation Failure) 277M->3M(989M) 5.145ms
[3.694s][info][safepoint] Safepoint "SerialCollectForAllocation", Time since last: 374870126 ns, Reaching safepoint: 271019 ns, At safepoint: 35859436 ns, Total: 36130455 ns
...which is extremely confusing: the GC itself took ~5ms, yet "At safepoint" mentions 35ms! All that time is actually spent disarming the WaitBarrier, and eating the de/re-scheduling for VMThread at the same time.
I think a "Leaving safepoint" counter would be good to have. It comes with the symmetry advantage with "Reaching safepoint" counter. "Leaving safepoint" measures the time spent in safepoint machinery from the "finishing side". And, it more clearly captures the transitional state where some threads might be still at safepoint, and some have already unparked, like "Reaching safepoint".
So the hypothetical log for the excerpt above would look something like:
[3.664s][info][gc ] GC(7) Pause Young (Allocation Failure) 277M->3M(989M) 5.145ms
[3.694s][info][safepoint] Safepoint "SerialCollectForAllocation", Time since last: 374870126 ns, Reaching safepoint: 271019 ns, At safepoint: 5859436 ns, Leaving safepoint: 30000000 ns, Total: 36130455 ns
In those cases we would see something like:
[3.664s][info][gc ] GC(7) Pause Young (Allocation Failure) 277M->3M(989M) 5.145ms
[3.694s][info][safepoint] Safepoint "SerialCollectForAllocation", Time since last: 374870126 ns, Reaching safepoint: 271019 ns, At safepoint: 35859436 ns, Total: 36130455 ns
...which is extremely confusing: the GC itself took ~5ms, yet "At safepoint" mentions 35ms! All that time is actually spent disarming the WaitBarrier, and eating the de/re-scheduling for VMThread at the same time.
I think a "Leaving safepoint" counter would be good to have. It comes with the symmetry advantage with "Reaching safepoint" counter. "Leaving safepoint" measures the time spent in safepoint machinery from the "finishing side". And, it more clearly captures the transitional state where some threads might be still at safepoint, and some have already unparked, like "Reaching safepoint".
So the hypothetical log for the excerpt above would look something like:
[3.664s][info][gc ] GC(7) Pause Young (Allocation Failure) 277M->3M(989M) 5.145ms
[3.694s][info][safepoint] Safepoint "SerialCollectForAllocation", Time since last: 374870126 ns, Reaching safepoint: 271019 ns, At safepoint: 5859436 ns, Leaving safepoint: 30000000 ns, Total: 36130455 ns
- relates to
-
JDK-8350324 Prioritize VMThread for safepoint operations
-
- Open
-
- links to
-
Commit(master) openjdk/jdk/9ec46968
-
Review(master) openjdk/jdk/23756