Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P3
Fix Version/s: 22
Affects Version/s: 17, 21, 22
Component/s: hotspot
Labels:

Subcomponent:
runtime
Resolved In Build:
b26

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8330642	21.0.4	Aleksey Shipilev	P3	Resolved	Fixed	b01
JDK-8333244	17.0.13	Aleksey Shipilev	P3	Resolved	Fixed	b01

While running simple benchmarks for safepoints, I was surprised to see impressively bad performance on my Mac M1 with a simple workload like this:

```
public class LotsRunnable {
   static final int THREAD_COUNT = Integer.getInteger("threads", Runtime.getRuntime().availableProcessors() * 4);
   static Object sink;

   public static void main(String... args) throws Exception {
     for (int c = 0; c < THREAD_COUNT; c++) {
       Thread t = new Thread(() -> {
         while (true) {
            Thread.onSpinWait();
         }
       });
       t.setDaemon(true);
       t.start();
     }

     System.out.println("Started");

     long stop = System.nanoTime() + 10_000_000_000L;
     while (System.nanoTime() < stop) {
       sink = new byte[100_000];
     }
   }
}
```

If you run with -Xlog:safepoint -Xlog:gc, then you would notice that GC pause times and the actual vm op times are completely out of whack. For example:

```
$ java -Xlog:safepoint -Xlog:gc -Xmx2g LotsRunnable.java
[3.188s][info][gc ] GC(19) Pause Young (Normal) (G1 Evacuation Pause) 308M->2M(514M) 0.878ms
[3.326s][info][safepoint] Safepoint "G1CollectForAllocation", Time since last: 4963375 ns, Reaching safepoint: 349292 ns, Cleanup: 2000 ns, At safepoint: 138700375 ns, Total: 139051667 ns
```
Note how the pause is <1ms, but the "At safepoint" is whole 138 ms (!!!).

Deeper profiling shows that the problem is on the path where we wake up the threads from the safepoint:
https://github.com/openjdk/jdk/blob/4f9f1955ab2737880158c57d4891d90e2fd2f5d7/src/hotspot/share/runtime/safepoint.cpp#L494-L495

~~JDK-8214271~~ ("Fast primitive to wake many threads") added the WaitBarrier to serve on that path. Before that, in JDK 11, the performance is okay. This makes it a regression between JDK 11 and JDK 17.

WaitBarrier has two implementations: one for Linux that uses futex-es, and another generic one that uses semaphores. For implementation reasons, the generic version has to wait for all threads to leave the barrier before it unblocks from disarm(). This means that all threads that are currently blocked for safepoint need to roll out of wait() before we unblock from safepoint! Which effectively runs into the same problem as TTSP, only worse: all those threads are blocked, need to be woken up, scheduled, etc.

This is not what Linux futex-based implementation does: it just notifies the futex, and leaves.

While unblocked threads start to execute, and so we are not completely blocked waiting for disarm(), this definitely:
a) trips the safepoint timings;
b) delays any further actions of VMThread;
c) delays resuming GC from STS, as `Universe::heap()->safepoint_synchronize_end()` comes after this;
d) places a limit on the safepoint frequency we can have;
e) maybe something else I cannot see right away;

I think the intent for the safepoint end code is to be fast to avoid any of these surprises. To that end, I think we can improve GenericWaitBarrier to avoid most of the performance cliff.

WIP: https://github.com/openjdk/jdk/pull/16404

backported by

JDK-8330642 Improve GenericWaitBarrier performance

Resolved

JDK-8333244 Improve GenericWaitBarrier performance

Resolved

relates to

JDK-8214271 Fast primitive to wake many threads

Resolved

links to

Commit openjdk/jdk17u-dev/515bc9a2

Commit openjdk/jdk21u-dev/6c5500bb

Commit openjdk/jdk/30462f9d

Review openjdk/jdk17u-dev/2041

Review openjdk/jdk21u-dev/70

Review openjdk/jdk/16404

(4 links to)

Assignee:: Aleksey Shipilev
Reporter:: Aleksey Shipilev
Votes:: 0 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: 2023-10-27 08:12
Updated:: 2024-05-30 00:38
Resolved:: 2023-11-22 09:57

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates