Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8312188

Performance regression in SharedRuntime::frem/drem() on non-Windows x86 after JDK-8302191

XMLWordPrintable

    • 21
    • inapplicable
    • x86
    • linux

      There are two performance regressions around float/double modulo:
      1) The first one (this bug) is observed after JDK-8302191 which changed SharedRuntime::frem/drem() for non-Windows x64 systems to no longer use the fmod() C-library implementation but instead use direct x86 assembly.
      2) The second regression only affects JDK 22 and is observed after the intrinsification of float/double modulo with JDK-8308966. The details for that regression can be found separately in JDK-8312233.


      The regression introduced by JDK-8302191 can be observed when running Blender2.java with a product/release VM (the machine used for the numbers below has AVX512 support - but the regression is also observed on AVX2 only machines):

      Commit just before JDK-8302191 which is JDK-8304683 (https://github.com/openjdk/jdk/commit/760c0128a4ef787c8c8addb26894c072ba8b2eb1):

      $ java Blender2.java

      Output:
      164 ms
      161 ms
      166 ms
      164 ms
      162 ms
      160 ms
      168 ms
      163 ms
      161 ms
      167 ms
      Average: 163 ms


      Commit of JDK-8302191 (https://github.com/openjdk/jdk/commit/37774556da8a5aacf55884133ae936ed5a28eab2):

      $ java Blender2.java

      Output:
      255 ms
      260 ms
      256 ms
      307 ms
      257 ms
      258 ms
      265 ms
      255 ms
      260 ms
      255 ms
      Average: 262 ms


      This suggests that the direct x86 assembly of SharedRuntime::frem/drem() is slower than the code executed by fmod(). We should have a closer look at the assembly produced by fmod() and improve our x86 assembly to fix the observed regressions.


      ---- Original Report ----
      There is potentially a significant (~25%) performance regression on a micro’-ish’-benchmark in JDK 21 - see attachment.

      The regression appears to be at least on Linux x86, but didn't appear on macOS x86. No other platforms were tried.

      With Blender.java, the 2nd Java sample on the blog [1], a significant drop in performance between JDK 20.0.1 and JDK 21 using the latest binaries [2].

      On the Ubuntu workstation,
          JDK 20.0.1 runs Blender in 822ms
          JDK 21 runs Blender in 1125ms

      (see attachments for full source code and sample output)

      [1] https://www.graalvm.org/22.1/examples/java-performance-examples/
      [2] https://jdk.java.net/

        1. Blender.java
          2 kB
        2. Blender2.java
          2 kB
        3. Blender4.java
          2 kB
        4. term2.txt
          0.8 kB

            dholmes David Holmes
            mtrudeau Michel Trudeau
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: