-
Enhancement
-
Resolution: Unresolved
-
P4
-
17, 21, 25
We are seeing a particularly nasty kind of problem in our services that run many runnable threads. Significant part of this nastiness is likely unresolvable on JVM side, but maybe there are solutions I am not seeing yet. I am trawling for ideas here :)
Take the reproducer from the attachment as the example:
$ build/linux-x86_64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseSerialGC -Xlog:async -Xlog:gc -Xlog:safepoint ThreadStalls.java
...
[3.506s][info][gc ] GC(9) Pause Young (Allocation Failure) 277M->3M(989M) 4.842ms
[3.619s][info][safepoint] Safepoint "SerialCollectForAllocation", Time since last: 320380747 ns, Reaching safepoint: 496127 ns, At safepoint: 118122442 ns, Total: 118618569 ns
...
Notice what it says to us: the GC operation took 5ms, but the whole "At safepoint" time is 118ms (!!!). It would be more visible as the time we spend leaving the safepoint onceJDK-8350313 is done. A finer-grained tracing shows we very often (but not always!) spend the majority of this time in `LinuxWaitBarrier::disarm` -> `futex` call. This points to the idea that VMThread that is currently disarming the safepoint got into some kind of stall.
Looking at max stalls from Java side, it looks like this stall reflects on at least some of the Java threads that are currently leaving the safepoint. (Reproducer prints those as "max stall =", but these stalls also include scheduling delays between the Java threads themselves, so those numbers are not completely clean.)
Our major hypothesis is that VMThread is de-scheduled while disarming the safepoint. This makes some sense given there is 1 VMThread and other 1000+ runnable threads, so a scheduler may decide to favor one of those 1000+ threads. If we enroll VMThread to SCHED_FIFO scheduling class, the hiccups like these disappear. I vaguely remember we have been doing a similar trick on Solaris. See Linux POC here: https://github.com/openjdk/jdk/compare/master...shipilev:jdk:wip-vmthread-goes-brrr
Maybe there is a completely JVM-side mitigation to this, but I struggle to find one without major drawbacks.
Things I tried:
1. SCHED_FIFO. See above. It works, but requires super-user privileges/capabilities, OR "ulimit -r 1". I have been able to run with POC without super-user privileges on the system that has "* - rtprio 1" in /etc/security/limits.conf. Maybe that is the answer: support SCHED_FIFO if we can, let users configure their environments to allow JVM to enroll threads to FIFO/RR priorities.
2. -XX:VMThreadPriority=11. Does not mitigate the hiccups.
3. Stalling safepoint-unparking threads on yielding/sleeping-busy-loop before VMThread finishes. That does not work well: there are still major stalls, now on Java thread side.
Things I thought about:
1. Somehow asking kernel to make FUTEX_WAKE non-preemptible? The majority of hiccups happen when we are already in the futex op. This does not solve the problem 100%, because we can be stalled before entering the futex, but it gives us a lot of bang.
Take the reproducer from the attachment as the example:
$ build/linux-x86_64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseSerialGC -Xlog:async -Xlog:gc -Xlog:safepoint ThreadStalls.java
...
[3.506s][info][gc ] GC(9) Pause Young (Allocation Failure) 277M->3M(989M) 4.842ms
[3.619s][info][safepoint] Safepoint "SerialCollectForAllocation", Time since last: 320380747 ns, Reaching safepoint: 496127 ns, At safepoint: 118122442 ns, Total: 118618569 ns
...
Notice what it says to us: the GC operation took 5ms, but the whole "At safepoint" time is 118ms (!!!). It would be more visible as the time we spend leaving the safepoint once
Looking at max stalls from Java side, it looks like this stall reflects on at least some of the Java threads that are currently leaving the safepoint. (Reproducer prints those as "max stall =", but these stalls also include scheduling delays between the Java threads themselves, so those numbers are not completely clean.)
Our major hypothesis is that VMThread is de-scheduled while disarming the safepoint. This makes some sense given there is 1 VMThread and other 1000+ runnable threads, so a scheduler may decide to favor one of those 1000+ threads. If we enroll VMThread to SCHED_FIFO scheduling class, the hiccups like these disappear. I vaguely remember we have been doing a similar trick on Solaris. See Linux POC here: https://github.com/openjdk/jdk/compare/master...shipilev:jdk:wip-vmthread-goes-brrr
Maybe there is a completely JVM-side mitigation to this, but I struggle to find one without major drawbacks.
Things I tried:
1. SCHED_FIFO. See above. It works, but requires super-user privileges/capabilities, OR "ulimit -r 1". I have been able to run with POC without super-user privileges on the system that has "* - rtprio 1" in /etc/security/limits.conf. Maybe that is the answer: support SCHED_FIFO if we can, let users configure their environments to allow JVM to enroll threads to FIFO/RR priorities.
2. -XX:VMThreadPriority=11. Does not mitigate the hiccups.
3. Stalling safepoint-unparking threads on yielding/sleeping-busy-loop before VMThread finishes. That does not work well: there are still major stalls, now on Java thread side.
Things I thought about:
1. Somehow asking kernel to make FUTEX_WAKE non-preemptible? The majority of hiccups happen when we are already in the futex op. This does not solve the problem 100%, because we can be stalled before entering the futex, but it gives us a lot of bang.
- relates to
-
JDK-8350285 Shenandoah: Regression caused by ShenandoahLock under extreme contention
-
- Resolved
-
-
JDK-8350313 Include timings for leaving safepoint in safepoint logging
-
- Resolved
-