Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8364936

Shenandoah: Switch nmethod entry barriers to conc_instruction_and_data_patch

XMLWordPrintable

    • gc

      Looking at Renaissance benchmarks, I notice that some benchmarks like scala-doku are significantly slower with Shenandoah in comparison with other collectors:

      $ build/linux-aarch64-server-release/images/jdk/bin/java -jar ~/renaissance-jmh-0.16.0.jar ScalaDoku -wi 3 -i 3 -f 1 --jvmArgs "-Xmx8g -Xms8g -XX:+AlwaysPreTouch -XX:+UseParallelGC"

      Benchmark Mode Cnt Score Error Units
      JmhScalaDoku.run ss 3 2160.655 ± 364.043 ms/op

      $ build/linux-aarch64-server-release/images/jdk/bin/java -jar ~/renaissance-jmh-0.16.0.jar ScalaDoku -wi 3 -i 3 -f 1 --jvmArgs "-Xmx8g -Xms8g -XX:+AlwaysPreTouch -XX:+UseShenandoahGC"

      Benchmark Mode Cnt Score Error Units
      JmhScalaDoku.run ss 3 3843.770 ± 740.348 ms/op


      perfasm shows the hotspot is in the "dmb ishld" in nmethod entry barrier.

      ....[Hottest Region 2]..............................................................................
      c2, scala.collection.immutable.SetIterator::next, version 1, compile id 893

                    0x0000ffffa8599008: nop
                  [Entry Point]
                    # {method} {0x0000fffe789eb008} 'next' '()Ljava/lang/Object;' in 'scala/collection/immutable/SetIterator'
                    # [sp+0x30] (sp of caller)
                    0x0000ffffa859900c: ldr w8, [x1, #8]
                    0x0000ffffa8599010: ldr w10, [x9, #8]
                    0x0000ffffa8599014: cmp w8, w10
                ╭ 0x0000ffffa8599018: b.eq 0x0000ffffa8599020 // b.none
                │ 0x0000ffffa859901c: b 0x0000ffffa848ec60 ; {runtime_call Shared Runtime ic_miss_blob}
                │ [Verified Entry Point]
         0.19% ↘ 0x0000ffffa8599020: sub x9, sp, #0x14, lsl #12
         0.09% 0x0000ffffa8599024: str xzr, [x9]
         0.09% 0x0000ffffa8599028: sub sp, sp, #0x30
         0.09% 0x0000ffffa859902c: stp x29, x30, [sp, #32]
         0.09% 0x0000ffffa8599030: ldr w8, 0x0000ffffa8599174
         0.07% 0x0000ffffa8599034: dmb ishld
         7.22% 0x0000ffffa8599038: ldr w9, [x28, #32]
         0.08% 0x0000ffffa859903c: cmp w8, w9


      It makes sense that it affects some benchmarks that are not as deeply inlined. Erik did JDK-8290700, which ported a new way to sync up nmethod barriers, conc_instruction_and_data_patch, from Generational ZGC repository into mainline. Generational ZGC have been using it since JDK 21. Switching Shenandoah to it like so:

      diff --git a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.hpp b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.hpp
      index a12d4e2beec..c89847b9d52 100644
      --- a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.hpp
      +++ b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.hpp
      @@ -67,7 +67,7 @@ class ShenandoahBarrierSetAssembler: public BarrierSetAssembler {
                                               Register scratch, RegSet saved_regs);
       
       public:
      - virtual NMethodPatchingType nmethod_patching_type() { return NMethodPatchingType::conc_data_patch; }
      + virtual NMethodPatchingType nmethod_patching_type() { return NMethodPatchingType::conc_instruction_and_data_patch; }
       
       #ifdef COMPILER1
         void gen_pre_barrier_stub(LIR_Assembler* ce, ShenandoahPreBarrierStub* stub);

      ...makes Shenandoah perform on significantly better on this example workload:

      Benchmark Mode Cnt Score Error Units
      JmhScalaDoku.run ss 3 2616.273 ± 51.920 ms/op

      We need to see what else should be done to support conc_instruction_and_data_patch in Shenandoah barriers.

            Unassigned Unassigned
            shade Aleksey Shipilev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: