Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 21
Affects Version/s: 21
Component/s: hotspot
Labels:
- performance
- vectorapi

Subcomponent:
compiler
Resolved In Build:
b13
CPU:

aarch64
OS:

generic

Given by the opt guide(A72), cmgt(register) and cmgt(zero) have the same latency. The later can save a mov instruction, from core to floating-point register.

movi v16.4s, #0x0
cmgt v16.4s, v17.4s, v16.4s

The code above could be optimized to

cmgt v16.4s, v17.4s, #0x0

test case:
```
import jdk.incubator.vector.*;

public class TestCmpZero {
     public static final int a = 0;
     public static VectorMask<Integer> testZero(IntVector va) {
        return va.compare(VectorOperators.LT, a);
      }
    public static void main(String[] args) {
        final IntVector va = IntVector.broadcast(IntVector.SPECIES_128, 1);
          for (int i = 0; i < 200000; i++) {
              testZero(va);
          }
      }
}

```

links to

Commit openjdk/jdk/d23a8bfb

Review openjdk/jdk/11822

Assignee:: Eric Liu

Reporter:: Eric Liu

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2022-11-29 00:56

Updated:: 2023-03-08 23:29

Resolved:: 2023-03-03 04:13

Details

Description

Attachments

Issue Links

Activity

People

Dates