SuperWordLoopUnrollAnalysis may lead to over loop unrolling

XMLWordPrintable

    • Type: Enhancement
    • Resolution: Unresolved
    • Priority: P4
    • tbd
    • Affects Version/s: 11, 12, 13, 14
    • Component/s: hotspot
    • x86
    • generic

      ## Reproduce
      Run the reproducer with:
        -1) java TestSuperWordOverunrolling
        -2) java -XX:-SuperWordLoopUnrollAnalysis TestSuperWordOverunrolling
      ---------------------------------
      public class TestSuperWordOverunrolling {

          public static void main(String[] args) {
              double sum = 0.0;
              long start = System.currentTimeMillis();
              for (int i = 0; i < 50000; i++) {
                  sum += execute(256);
              }
              long end = System.currentTimeMillis();
              System.out.println("sum = " + sum + "; time = " + (end - start) + "ms");
          }

          public static double execute(int num_iterations) {
              int M = 63;
              byte[][] G = new byte[M][M];

              int Mm1 = M-1;
              for (int p = 0; p < num_iterations; p++) {
                  for (int i = 1; i < Mm1; i++) {
                      for (int j = 1; j < Mm1; j++)
                          G[i][j] = G[i-1][j];
                  }
              }

              return G[3][2];
          }
      }
      ---------------------------------

      ## Symptom
      1) java TestSuperWordOverunrolling
      ---------------------------------
      sum = 0.0; time = 9360ms
      sum = 0.0; time = 9345ms
      sum = 0.0; time = 9376ms
      sum = 0.0; time = 9389ms
      ---------------------------------

      2) java -XX:-SuperWordLoopUnrollAnalysis TestSuperWordOverunrolling
      ---------------------------------
      sum = 0.0; time = 5564ms
      sum = 0.0; time = 5575ms
      sum = 0.0; time = 5520ms
      sum = 0.0; time = 5552ms
      ---------------------------------

      ## Analysis
      The performance drop was caused by over loop unrolling with SuperWordLoopUnrollAnalysis.
      For this reproducer, the loop was unrolled by 16, which was bad for the performance.

            Assignee:
            Jie Fu
            Reporter:
            Jie Fu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: