Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8302908

RISC-V: Support masked vector arithmetic instructions for Vector API

XMLWordPrintable

    • b20
    • riscv
    • linux

      We have added support for vector add mask instructions, please take a look and have some reviews. Thanks a lot!
      This patch will add support of vector add/sub/mul/div mask version. It was implemented by referring to RVV v1.0 [1].

      ## Load/Store/Cmp Mask
      `VectorLoadMask, VectorMaskCmp, VectorStoreMask` will implement the mask datapath. We can see where the data is passed in the compilation log with `jdk/incubator/vector/Byte128VectorTests.java`:
      ```
      218 loadV V1, [R7] # vector (rvv)
      220 vloadmask V0, V1
      ...
      23c vmaskcmp_rvv_masked V0, V4, V5, V0, V1, #0
      24c vstoremask V1, V0
      258 storeV [R7], V1 # vector (rvv)
      ```

      The corresponding generated jit assembly:
      ```
      # loadV
      0x000000400c8ef958: vsetvli t0,zero,e8,m1,tu,mu
      0x000000400c8ef95c: vle8.v v1,(t2)

      # vloadmask
      0x000000400c8ef960: vsetvli t0,zero,e8,m1,tu,
      0x000000400c8ef964: vmsne.vx v0,v1,zero

      # vmaskcmp_rvv_masked
      0x000000400c8ef97c: vsetvli t0,zero,e8,m1,tu,mu
      0x000000400c8ef980: vmclr.m v1
      0x000000400c8ef984: vmseq.vv v1,v4,v5,v0.t
      0x000000400c8ef988: vmv1r.v v0,v1

      # vstoremask
      0x000000400c8ef98c: vsetvli t0,zero,e8,m1,tu,mu
      0x000000400c8ef990: vmv.v.x v1,zero
      0x000000400c8ef994: vmerge.vim v1,v1,1,v0
      ```

      ## Masked vector arithmetic instructions (e.g. vadd)
      AddMaskTestMerge case:
      ```java
      import jdk.incubator.vector.IntVector;
      import jdk.incubator.vector.VectorMask;
      import jdk.incubator.vector.VectorOperators;
      import jdk.incubator.vector.VectorSpecies;

      public class AddMaskTestMerge {

          static final VectorSpecies<Integer> SPECIES = IntVector.SPECIES_128;
          static final int SIZE = 1024;
          static int[] a = new int[SIZE];
          static int[] b = new int[SIZE];
          static int[] r = new int[SIZE];
          static boolean[] c = new boolean[]{true,false,true,false,true,false,true,false};
          static {
              for (int i = 0; i < SIZE; i++) {
                  a[i] = i;
                  b[i] = i;
              }
          }

          static void workload(int idx) {
              VectorMask<Integer> vmask = VectorMask.fromArray(SPECIES, c, 0);
              IntVector av = IntVector.fromArray(SPECIES, a, idx);
              IntVector bv = IntVector.fromArray(SPECIES, b, idx);
              av.lanewise(VectorOperators.ADD, bv, vmask).intoArray(r, idx);
          }

          public static void main(String[] args) {
              for (int i = 0; i < 30_0000; i++) {
                  for (int j = 0; j < SIZE; j += SPECIES.length()) {
                      workload(j);
                  }
              }
          }
      }
      ```

      This test case is reduced from existing jtreg vector tests Int128VectorTests.java[3]. This test case corresponds to the add instruction of the vector mask version and other instructions are similar.

      Before this patch, the compilation log will not print RVV-related instructions. Now the compilation log is as follows:

      ```
      0ae B10: # out( B25 B11 ) <- in( B9 ) Freq: 0.999991
      0ae loadV V1, [R31] # vector (rvv)
      0b6 vloadmask V0, V2
      0be vadd.vv V3, V1, V0 #@vaddI_masked
      0c6 lwu R28, [R7, #124] # loadN, compressed ptr, #@loadN ! Field: AddMaskTestMerge.r
      0ca decode_heap_oop R28, R28 #@decodeHeapOop
      0cc lwu R7, [R28, #12] # range, #@loadRange
      0d0 NullCheck R28
      ```

      And the jit code is as follows:

      ```
      0x000000400c823cee: vsetvli t0,zero,e32,m1,tu,mu
      0x000000400c823cf2: vle32.v v1,(t6) ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - jdk.incubator.vector.IntVector::intoArray@43 (line 3228)
                                                                ; - AddMaskTestMerge::workload@46 (line 25)
      0x000000400c823cf6: vsetvli t0,zero,e8,m1,tu,mu
      0x000000400c823cfa: vmsne.vx v0,v2,zero ;*invokestatic load {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - jdk.incubator.vector.VectorMask::fromArray@47 (line 208)
                                                                ; - AddMaskTestMerge::workload@7 (line 22)
      0x000000400c823cfe: vsetvli t0,zero,e32,m1,tu,mu
      0x000000400c823d02: vadd.vv v3,v3,v1,v0.t ;*invokestatic binaryOp {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - jdk.incubator.vector.IntVector::lanewiseTemplate@192 (line 834)
                                                                ; - jdk.incubator.vector.Int128Vector::lanewise@9 (line 291)
                                                                ; - jdk.incubator.vector.Int128Vector::lanewise@4 (line 41)
                                                                ; - AddMaskTestMerge::workload@39 (line 25)
      ```

      ## Mask register allocation & mask bit opreation
      Since v0 is to be used as a mask register in spec[1], sometimes we need two vmask to do the vector mask logical ops like `AndVMask, OrVMask, XorVMask`. And if only v0 and v31 mask registers are defined, the corresponding c2 nodes will not be generated correctly because of the register pressure[2], so define v30 and v31 as mask register too.

      `AndVMask` will emit the C2 JIT code like:
      ```
      vloadmask V0, V1
      vloadmask V30, V2
      vmask_and V0, V30, V0
      ```
      We also modified the implementation of `spill_copy_vector_stack_to_stack` so that it no longer occupies the v0 register. In addition to that, we change some node like `vasr/vlsl/vlsr/vstring_x/varray_x/vclearArray_x`, which use v0 internally, to make C2 to sense that they used v0.

      By the way, the current implementation of `VectorMaskCast` is for the case of equal width of the parameter data, other cases depend on the subsequent cast node.

      [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc
      [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int128VectorTests.java
      [3] https://github.com/openjdk/jdk/blob/0deb648985b018653ccdaf193dc13b3cf21c088a/src/hotspot/share/opto/chaitin.cpp#L526

            dzhang Dingli Zhang
            dzhang Dingli Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: