Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 24
Affects Version/s: 23
Component/s: hotspot
Labels:
- amazon-interest
- starter

Subcomponent:
gc
Resolved In Build:
b04

If you consider this contrived example:

```
% cat ManyThreadsStacks.java
public class ManyThreadStacks {
        static final int THREADS = 1024;
        static final int DEPTH = 1024;
        static volatile Object sink;

        public static void main(String... args) {
                for (int t = 0; t < DEPTH; t++) {
                        int ft = t;
                        new Thread(() -> work(1024)).start();
                }

                while (true) {
                        sink = new byte[100_000];
                }
        }

        public static void work(int depth) {
                if (depth > 0) {
                        work(depth - 1);
                }
                while (true) {
                        try {
                                Thread.sleep(100);
                        } catch (Exception e) {
                                return;
                        }
                }
        }
}
```

...then the concurrent GC times for both Shenandoah and Z are quite unusual for the amount of work we need to do:

```
% build/linux-aarch64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseShenandoahGC -Xlog:gc ManyThreadsStacks.java 2>&1 | grep "marking roots"
[0.846s][info][gc] GC(0) Concurrent marking roots 142.135ms
[1.010s][info][gc] GC(1) Concurrent marking roots 146.222ms
[1.174s][info][gc] GC(2) Concurrent marking roots 150.447ms
[1.330s][info][gc] GC(3) Concurrent marking roots 144.910ms
```

```
% build/linux-aarch64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseZGC -Xlog:gc -Xlog:gc+phases ManyThreadsStacks.java | grep "Concurrent Mark"
[1.000s][info][gc,phases] GC(0) Concurrent Mark 154.720ms
[1.001s][info][gc,phases] GC(0) Concurrent Mark Free 0.001ms
[1.187s][info][gc,phases] GC(1) Concurrent Mark 157.702ms
[1.187s][info][gc,phases] GC(1) Concurrent Mark Free 0.001ms
[1.394s][info][gc,phases] GC(2) Concurrent Mark 148.263ms
[1.394s][info][gc,phases] GC(2) Concurrent Mark Free 0.001ms
[1.557s][info][gc,phases] GC(3) Concurrent Mark 153.175ms
[1.558s][info][gc,phases] GC(3) Concurrent Mark Free 0.001ms
```

The profiles show that we are spending the majority of this time on acquiring the per-nmethod locks here:
https://github.com/openjdk/jdk/blob/c6f611cfe0f3d6807b450be19ec00713229dbf42/src/hotspot/share/gc/shenandoah/shenandoahBarrierSetNMethod.cpp#L41
https://github.com/openjdk/jdk/blob/c6f611cfe0f3d6807b450be19ec00713229dbf42/src/hotspot/share/gc/x/xBarrierSetNMethod.cpp#L35
https://github.com/openjdk/jdk/blob/c6f611cfe0f3d6807b450be19ec00713229dbf42/src/hotspot/share/gc/z/zBarrierSetNMethod.cpp#L40

...which kinda makes sense since all the threads are in the same nmethod.

The question is, can we do a double-checked locking here, by doing the `is_armed` check before the lock acquisition? At least for Shenandoah it improves the timings considerably:

```
% build/linux-aarch64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseShenandoahGC -Xlog:gc ManyThreadsStacks.java 2>&1 | grep "marking roots"
[0.697s][info][gc] GC(0) Concurrent marking roots 3.914ms
[0.716s][info][gc] GC(1) Concurrent marking roots 3.896ms
[0.737s][info][gc] GC(2) Concurrent marking roots 3.896ms
[0.757s][info][gc] GC(3) Concurrent marking roots 3.876ms
[0.776s][info][gc] GC(4) Concurrent marking roots 3.908ms
```

relates to

JDK-8333716 Shenandoah: Check for disarmed method before taking the nmethod lock

Resolved

JDK-8334890 Missing unconditional cross modifying fence in nmethod entry barriers

Resolved

links to

Commit openjdk/jdk/c30e0403

Review openjdk/jdk/19285

Assignee:: Neethu Prasad

Reporter:: Aleksey Shipilev

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024-05-08 02:06

Updated:: 2024-06-27 09:02

Resolved:: 2024-06-25 00:10

Details

Description

Attachments

Issue Links

Activity

People

Dates