Loading...

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 21
Affects Version/s: 21
Component/s: hotspot
Labels:
- c2
- oraclejdk-na
- performance
- vectorapi

Subcomponent:
compiler
Resolved In Build:
b20
CPU:

riscv
OS:

linux

We have added support for vector add mask instructions, please take a look and have some reviews. Thanks a lot!
This patch will add support of vector add/sub/mul/div mask version. It was implemented by referring to RVV v1.0 [1].

## Load/Store/Cmp Mask
`VectorLoadMask, VectorMaskCmp, VectorStoreMask` will implement the mask datapath. We can see where the data is passed in the compilation log with `jdk/incubator/vector/Byte128VectorTests.java`：
```
218 loadV V1, [R7] # vector (rvv)
220 vloadmask V0, V1
...
23c vmaskcmp_rvv_masked V0, V4, V5, V0, V1, #0
24c vstoremask V1, V0
258 storeV [R7], V1 # vector (rvv)
```

The corresponding generated jit assembly：
```
# loadV
0x000000400c8ef958: vsetvli t0,zero,e8,m1,tu,mu
0x000000400c8ef95c: vle8.v v1,(t2)

# vloadmask
0x000000400c8ef960: vsetvli t0,zero,e8,m1,tu,
0x000000400c8ef964: vmsne.vx v0,v1,zero

# vmaskcmp_rvv_masked
0x000000400c8ef97c: vsetvli t0,zero,e8,m1,tu,mu
0x000000400c8ef980: vmclr.m v1
0x000000400c8ef984: vmseq.vv v1,v4,v5,v0.t
0x000000400c8ef988: vmv1r.v v0,v1

# vstoremask
0x000000400c8ef98c: vsetvli t0,zero,e8,m1,tu,mu
0x000000400c8ef990: vmv.v.x v1,zero
0x000000400c8ef994: vmerge.vim v1,v1,1,v0
```

## Masked vector arithmetic instructions (e.g. vadd)
AddMaskTestMerge case:
```java
import jdk.incubator.vector.IntVector;
import jdk.incubator.vector.VectorMask;
import jdk.incubator.vector.VectorOperators;
import jdk.incubator.vector.VectorSpecies;

public class AddMaskTestMerge {

    static final VectorSpecies<Integer> SPECIES = IntVector.SPECIES_128;
    static final int SIZE = 1024;
    static int[] a = new int[SIZE];
    static int[] b = new int[SIZE];
    static int[] r = new int[SIZE];
    static boolean[] c = new boolean[]{true,false,true,false,true,false,true,false};
    static {
        for (int i = 0; i < SIZE; i++) {
            a[i] = i;
            b[i] = i;
        }
    }

    static void workload(int idx) {
        VectorMask<Integer> vmask = VectorMask.fromArray(SPECIES, c, 0);
        IntVector av = IntVector.fromArray(SPECIES, a, idx);
        IntVector bv = IntVector.fromArray(SPECIES, b, idx);
        av.lanewise(VectorOperators.ADD, bv, vmask).intoArray(r, idx);
    }

    public static void main(String[] args) {
        for (int i = 0; i < 30_0000; i++) {
            for (int j = 0; j < SIZE; j += SPECIES.length()) {
                workload(j);
            }
        }
    }
}
```

This test case is reduced from existing jtreg vector tests Int128VectorTests.java[3]. This test case corresponds to the add instruction of the vector mask version and other instructions are similar.

Before this patch, the compilation log will not print RVV-related instructions. Now the compilation log is as follows:

```
0ae B10: # out( B25 B11 ) <- in( B9 ) Freq: 0.999991
0ae loadV V1, [R31] # vector (rvv)
0b6 vloadmask V0, V2
0be vadd.vv V3, V1, V0 #@vaddI_masked
0c6 lwu R28, [R7, #124] # loadN, compressed ptr, #@loadN ! Field: AddMaskTestMerge.r
0ca decode_heap_oop R28, R28 #@decodeHeapOop
0cc lwu R7, [R28, #12] # range, #@loadRange
0d0 NullCheck R28
```

And the jit code is as follows:

```
0x000000400c823cee: vsetvli t0,zero,e32,m1,tu,mu
0x000000400c823cf2: vle32.v v1,(t6) ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
                                                          ; - jdk.incubator.vector.IntVector::intoArray@43 (line 3228)
                                                          ; - AddMaskTestMerge::workload@46 (line 25)
0x000000400c823cf6: vsetvli t0,zero,e8,m1,tu,mu
0x000000400c823cfa: vmsne.vx v0,v2,zero ;*invokestatic load {reexecute=0 rethrow=0 return_oop=0}
                                                          ; - jdk.incubator.vector.VectorMask::fromArray@47 (line 208)
                                                          ; - AddMaskTestMerge::workload@7 (line 22)
0x000000400c823cfe: vsetvli t0,zero,e32,m1,tu,mu
0x000000400c823d02: vadd.vv v3,v3,v1,v0.t ;*invokestatic binaryOp {reexecute=0 rethrow=0 return_oop=0}
                                                          ; - jdk.incubator.vector.IntVector::lanewiseTemplate@192 (line 834)
                                                          ; - jdk.incubator.vector.Int128Vector::lanewise@9 (line 291)
                                                          ; - jdk.incubator.vector.Int128Vector::lanewise@4 (line 41)
                                                          ; - AddMaskTestMerge::workload@39 (line 25)
```

## Mask register allocation & mask bit opreation
Since v0 is to be used as a mask register in spec[1], sometimes we need two vmask to do the vector mask logical ops like `AndVMask, OrVMask, XorVMask`. And if only v0 and v31 mask registers are defined, the corresponding c2 nodes will not be generated correctly because of the register pressure[2], so define v30 and v31 as mask register too.

`AndVMask` will emit the C2 JIT code like:
```
vloadmask V0, V1
vloadmask V30, V2
vmask_and V0, V30, V0
```
We also modified the implementation of `spill_copy_vector_stack_to_stack` so that it no longer occupies the v0 register. In addition to that, we change some node like `vasr/vlsl/vlsr/vstring_x/varray_x/vclearArray_x`, which use v0 internally, to make C2 to sense that they used v0.

By the way, the current implementation of `VectorMaskCast` is for the case of equal width of the parameter data, other cases depend on the subsequent cast node.

[1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc
[2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int128VectorTests.java
[3] https://github.com/openjdk/jdk/blob/0deb648985b018653ccdaf193dc13b3cf21c088a/src/hotspot/share/opto/chaitin.cpp#L526

links to

Commit openjdk/jdk/1c1a73f7

Review openjdk/jdk/12682

Details

Description

Attachments

Issue Links

Activity

People

Dates