-
Enhancement
-
Resolution: Fixed
-
P4
-
21
-
b13
-
aarch64
-
generic
Given by the opt guide(A72), cmgt(register) and cmgt(zero) have the same latency. The later can save a mov instruction, from core to floating-point register.
movi v16.4s, #0x0
cmgt v16.4s, v17.4s, v16.4s
The code above could be optimized to
cmgt v16.4s, v17.4s, #0x0
test case:
```
import jdk.incubator.vector.*;
public class TestCmpZero {
public static final int a = 0;
public static VectorMask<Integer> testZero(IntVector va) {
return va.compare(VectorOperators.LT, a);
}
public static void main(String[] args) {
final IntVector va = IntVector.broadcast(IntVector.SPECIES_128, 1);
for (int i = 0; i < 200000; i++) {
testZero(va);
}
}
}
```
movi v16.4s, #0x0
cmgt v16.4s, v17.4s, v16.4s
The code above could be optimized to
cmgt v16.4s, v17.4s, #0x0
test case:
```
import jdk.incubator.vector.*;
public class TestCmpZero {
public static final int a = 0;
public static VectorMask<Integer> testZero(IntVector va) {
return va.compare(VectorOperators.LT, a);
}
public static void main(String[] args) {
final IntVector va = IntVector.broadcast(IntVector.SPECIES_128, 1);
for (int i = 0; i < 200000; i++) {
testZero(va);
}
}
}
```