Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8357460

RISC-V: Optimize array fill stub for small size

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P4 P4
    • 25
    • 25
    • hotspot
    • b25
    • riscv
    • linux

      Currently, we use a loop to handle array filling of size less than 8 bytes in the array filling stub:
      ```
          // Handle copies less than 8 bytes.
          Label L_loop1, L_loop2, L_exit2;
          __ bind(L_fill_elements);
          __ beqz(count, L_exit2);
          switch (t) {
            case T_BYTE:
              __ bind(L_loop1);
              __ sb(value, Address(to, 0));
              __ addi(to, to, 1);
              __ subiw(count, count, 1);
              __ bnez(count, L_loop1);
              break;
            case T_SHORT:
              __ bind(L_loop2);
              __ sh(value, Address(to, 0));
              __ addi(to, to, 2);
              __ subiw(count, count, 2 >> shift);
              __ bnez(count, L_loop2);
              break;
            case T_INT:
              __ sw(value, Address(to, 0));
              break;
            default: ShouldNotReachHere();
      ```

      We can eliminate the loop for the T_BYTE and T_SHORT cases by unrolling sb and sh.
      We have witnessed the additional performance gains for the small-size byte array fills:

      Before:
      Benchmark (size) Mode Cnt Score Error Units
      ArrayFill.fillByteArray 7 avgt 12 27.036 ± 0.061 ns/op
      ArrayFill.fillIntArray 7 avgt 12 28.628 ± 0.013 ns/op
      ArrayFill.fillShortArray 7 avgt 12 30.775 ± 0.008 ns/op
      ArrayFill.zeroByteArray 7 avgt 12 27.076 ± 0.013 ns/op
      ArrayFill.zeroIntArray 7 avgt 12 28.624 ± 0.003 ns/op
      ArrayFill.zeroShortArray 7 avgt 12 30.776 ± 0.009 ns/op

      After:
      Benchmark (size) Mode Cnt Score Error Units
      ArrayFill.fillByteArray 7 avgt 12 19.347 ± 0.079 ns/op
      ArrayFill.fillIntArray 7 avgt 12 28.639 ± 0.012 ns/op
      ArrayFill.fillShortArray 7 avgt 12 30.777 ± 0.015 ns/op
      ArrayFill.zeroByteArray 7 avgt 12 19.646 ± 0.599 ns/op
      ArrayFill.zeroIntArray 7 avgt 12 28.631 ± 0.008 ns/op
      ArrayFill.zeroShortArray 7 avgt 12 30.780 ± 0.009 ns/op

            fjiang Feilong Jiang
            fjiang Feilong Jiang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: