Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8312233

Performance regression in SharedRuntime::frem/drem() on x86 with AVX2 after JDK-8308966

XMLWordPrintable

      There is a performance regression for non-AVX512 x86 systems after the integration of JDK-8308966 which intrinsifies float/double modulo. This can be observed/isolated by running Blender.java with flags to disable most of the C2 optimizations and only compiling test(). On AVX512 there is a small regression of 2-3% which might also be worth looking into. The regression can also be observed with the interpreter only by using -Xint:


      Test on AVX2
      ========
      Setup:
      - AVX512 not available
      - AVX2 available
      - FMA instructions available
      - fastdebug build

      --- JDK 22+7/mainline ---

      Default:
      $ java -XX:-TieredCompilation -XX:LoopMaxUnroll=0 -XX:-DoEscapeAnalysis -XX:+UseParallelGC -XX:CompileCommand=compileonly,Blender::test Blender.java

      Output:
      2847 ms
      2860 ms
      2876 ms
      2861 ms
      2868 ms
      2867 ms
      2877 ms
      2875 ms
      2880 ms
      2880 ms
      Average: 2869 ms


      Disabling FMA instruction with -XX:-UseFMA:
      $ java -XX:-UseFMA -XX:-TieredCompilation -XX:LoopMaxUnroll=0 -XX:-DoEscapeAnalysis -XX:+UseParallelGC -XX:CompileCommand=compileonly,Blender::test Blender.java

      Output:
      329 ms
      329 ms
      330 ms
      330 ms
      331 ms
      331 ms
      332 ms
      330 ms
      332 ms
      332 ms
      Average: 330 ms


      --- JDK 21+31 ---

      $ java -XX:-TieredCompilation -XX:LoopMaxUnroll=0 -XX:-DoEscapeAnalysis -XX:+UseParallelGC -XX:CompileCommand=compileonly,Blender::test Blender.java

      Output:
      341 ms
      340 ms
      341 ms
      341 ms
      340 ms
      341 ms
      340 ms
      341 ms
      341 ms
      341 ms
      Average: 340 ms


      -----> SUMMARY: ~9x regression in JDK 22 for AVX2 without AVX512


      === Interpreter only ===

      --- JDK 22+7/mainline ---

      Default:
      $ java -Xint Blender.java

      Output:
      3311 ms
      3310 ms
      3314 ms
      3324 ms
      3320 ms
      3333 ms
      3343 ms
      3350 ms
      3343 ms
      3336 ms
      Average: 3328 ms

      Disabling FMA instruction with -XX:-UseFMA:
      $ java -XX:-UseFMA -Xint Blender.java

      Output:
      956 ms
      877 ms
      865 ms
      886 ms
      897 ms
      917 ms
      886 ms
      876 ms
      863 ms
      903 ms
      Average: 892 ms

      --- JDK 21+31 ---

      $ java -Xint Blender.java

      Output:
      917 ms
      930 ms
      951 ms
      973 ms
      941 ms
      926 ms
      948 ms
      963 ms
      971 ms
      975 ms
      Average: 949 ms


      -----> SUMMARY: ~3x regression in JDK 22 for AVX2 without AVX512 with interpreter only



      Test on AVX512
      =========
      Setup:
      - AVX512 available where VM_Version::supports_avx512vlbwdq() is true
      - fastdebug

      --- JDK 22+7/mainline ---

      Default:
      $ java -XX:-TieredCompilation -XX:LoopMaxUnroll=0 -XX:-DoEscapeAnalysis -XX:+UseParallelGC -XX:CompileCommand=compileonly,Blender::test Blender.java

      Output:
      907 ms
      907 ms
      908 ms
      907 ms
      917 ms
      908 ms
      908 ms
      908 ms
      910 ms
      907 ms
      Average: 908 ms

      --- JDK 21+31 ---

      888 ms
      884 ms
      884 ms
      884 ms
      885 ms
      884 ms
      888 ms
      890 ms
      884 ms
      884 ms
      Average: 885 ms


      -----> SUMMARY: ~2-3% regression in JDK 22 for AVX512

            sgibbons Scott Gibbons (Inactive)
            chagedorn Christian Hagedorn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: