Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8297753

AArch64: Add optimized rules for vector compare with zero on NEON

XMLWordPrintable

    • b13
    • aarch64
    • generic

      Given by the opt guide(A72), cmgt(register) and cmgt(zero) have the same latency. The later can save a mov instruction, from core to floating-point register.

      movi v16.4s, #0x0
      cmgt v16.4s, v17.4s, v16.4s

      The code above could be optimized to

      cmgt v16.4s, v17.4s, #0x0
       

      test case:
      ```
       import jdk.incubator.vector.*;
       
      public class TestCmpZero {
           public static final int a = 0;
           public static VectorMask<Integer> testZero(IntVector va) {
              return va.compare(VectorOperators.LT, a);
            }
          public static void main(String[] args) {
              final IntVector va = IntVector.broadcast(IntVector.SPECIES_128, 1);
                for (int i = 0; i < 200000; i++) {
                    testZero(va);
                }
            }
       }

      ```

            eliu Eric Liu
            eliu Eric Liu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: