Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8331911

Reconsider locking for recently disarmed nmethods

XMLWordPrintable

    • gc
    • b04

      If you consider this contrived example:

      ```
      % cat ManyThreadsStacks.java
      public class ManyThreadStacks {
              static final int THREADS = 1024;
              static final int DEPTH = 1024;
              static volatile Object sink;

              public static void main(String... args) {
                      for (int t = 0; t < DEPTH; t++) {
                              int ft = t;
                              new Thread(() -> work(1024)).start();
                      }

                      while (true) {
                              sink = new byte[100_000];
                      }
              }

              public static void work(int depth) {
                      if (depth > 0) {
                              work(depth - 1);
                      }
                      while (true) {
                              try {
                                      Thread.sleep(100);
                              } catch (Exception e) {
                                      return;
                              }
                      }
              }
      }
      ```

      ...then the concurrent GC times for both Shenandoah and Z are quite unusual for the amount of work we need to do:

      ```
      % build/linux-aarch64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseShenandoahGC -Xlog:gc ManyThreadsStacks.java 2>&1 | grep "marking roots"
      [0.846s][info][gc] GC(0) Concurrent marking roots 142.135ms
      [1.010s][info][gc] GC(1) Concurrent marking roots 146.222ms
      [1.174s][info][gc] GC(2) Concurrent marking roots 150.447ms
      [1.330s][info][gc] GC(3) Concurrent marking roots 144.910ms
      ```

      ```
      % build/linux-aarch64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseZGC -Xlog:gc -Xlog:gc+phases ManyThreadsStacks.java | grep "Concurrent Mark"
      [1.000s][info][gc,phases] GC(0) Concurrent Mark 154.720ms
      [1.001s][info][gc,phases] GC(0) Concurrent Mark Free 0.001ms
      [1.187s][info][gc,phases] GC(1) Concurrent Mark 157.702ms
      [1.187s][info][gc,phases] GC(1) Concurrent Mark Free 0.001ms
      [1.394s][info][gc,phases] GC(2) Concurrent Mark 148.263ms
      [1.394s][info][gc,phases] GC(2) Concurrent Mark Free 0.001ms
      [1.557s][info][gc,phases] GC(3) Concurrent Mark 153.175ms
      [1.558s][info][gc,phases] GC(3) Concurrent Mark Free 0.001ms
      ```

      The profiles show that we are spending the majority of this time on acquiring the per-nmethod locks here:
       https://github.com/openjdk/jdk/blob/c6f611cfe0f3d6807b450be19ec00713229dbf42/src/hotspot/share/gc/shenandoah/shenandoahBarrierSetNMethod.cpp#L41
       https://github.com/openjdk/jdk/blob/c6f611cfe0f3d6807b450be19ec00713229dbf42/src/hotspot/share/gc/x/xBarrierSetNMethod.cpp#L35
       https://github.com/openjdk/jdk/blob/c6f611cfe0f3d6807b450be19ec00713229dbf42/src/hotspot/share/gc/z/zBarrierSetNMethod.cpp#L40

      ...which kinda makes sense since all the threads are in the same nmethod.

      The question is, can we do a double-checked locking here, by doing the `is_armed` check before the lock acquisition? At least for Shenandoah it improves the timings considerably:

      ```
      % build/linux-aarch64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseShenandoahGC -Xlog:gc ManyThreadsStacks.java 2>&1 | grep "marking roots"
      [0.697s][info][gc] GC(0) Concurrent marking roots 3.914ms
      [0.716s][info][gc] GC(1) Concurrent marking roots 3.896ms
      [0.737s][info][gc] GC(2) Concurrent marking roots 3.896ms
      [0.757s][info][gc] GC(3) Concurrent marking roots 3.876ms
      [0.776s][info][gc] GC(4) Concurrent marking roots 3.908ms
      ```

            nprasad Neethu Prasad
            shade Aleksey Shipilev
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: