Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8355094

Performance drop in auto-vectorized kernel due to split store

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • None
    • hotspot
    • None

      Following benchmarking kernel shows around 20% performance drop with latest JDK-25 build 25-ea+19-2255) vs JDK-17 build 17.0.9+11-LTS-201 due to split stores.

      https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/VectorLoadToStoreForwarding.java#L197

      Command line: perf stat -e cycles,instructions,mem_inst_retired.all_stores,mem_inst_retired.split_stores java -jar target/benchmarks.jar -f 1 -i 2 -wi 1 -w 30 org.openjdk.bench.vm.compiler.VectorLoadToStoreForwarding.VectorLoadToStoreForwardingSuperWord.benchmark_20


      JDK-25 PMU events
      ================
         92,58,13,16,800 cycles
         28,45,27,41,807 instructions # 0.31 insn per cycle
          9,58,42,45,086 mem_inst_retired.all_stores
          4,49,51,55,071 mem_inst_retired.split_stores
            32.510948769 seconds time elapsed
            33.010587000 seconds user
             0.194167000 seconds sys

      System: Model name: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (CascadeLake Server)

            epeter Emanuel Peter
            jbhateja Jatin Bhateja
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: