Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8279621

x86_64 arraycopy stubs should use 256-bit copies with AVX=1

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Won't Fix
    • Icon: P4 P4
    • 19
    • 8, 11, 17, 18, 19
    • hotspot
    • None

      While working on JDK-8150730 and looking at performance results for it, I noticed a pecularity in current arraycopy implementation. It looks as if changing UseAVX from 0 to 1 does not improve the baseline scores:
        https://cr.openjdk.java.net/~shade/8150730/i11500.png
        https://cr.openjdk.java.net/~shade/8150730/tr3970x.png

      The problem is that the arraycopy generators use vmovdqu only for UseAVX >= 2:

            if (UseAVX >= 2) {
              __ vmovdqu(xmm0, Address(end_from, qword_count, Address::times_8, -56));
              ...
            } else {
              __ movdqu(xmm0, Address(end_from, qword_count, Address::times_8, -56));
              ...
            }

      ...while 256-bit vmovdqu is actually available for plain AVX(1) as well (matches VEX.256 encoding, as per Intel SDM):

        // Move Unaligned 256bit Vector
        void vmovdqu(Address dst, XMMRegister src);
        void vmovdqu(XMMRegister dst, Address src);
        void vmovdqu(XMMRegister dst, XMMRegister src);

      Seems to be that way since the initial implementation in JDK-8005544.

      Relaxing the requirement to UseAVX=1 in that code provides substantial performance improvements:
        https://github.com/openjdk/jdk/pull/6987

            shade Aleksey Shipilev
            shade Aleksey Shipilev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: