Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8365779

Use larger load/stores in inlined arraycopy

XMLWordPrintable

      In https://bugs.openjdk.org/browse/JDK-6912521, an optimization was added to inline individual load/stores for small arraycopy invocations. These loads and stores are 1 byte each. We could use word-sized load/stores where appropriate and increase the ArrayCopyLoadStoreMaxElem commensurately.
      E.g. a 16-byte (aligned) arraycopy could be two inlined loads/stores, instead of calling the runtime stub.

      A naive implementation gives me 5% improvement in SPECjvm crypto.signverify which does a lot of 16-byte copies. UseAVX=3 is faster due to https://bugs.openjdk.org/browse/JDK-8252848.

      crypto.signverify on Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz:
      UseAVX=3: 209.6 op/s (uses inlined vector masks from JDK-8252848)
      UseAVX=2: 198.4 op/s (calls arraycopy runtime stub)
      UseAVX=2 ArrayCopyLoadStoreMaxElem=256: 203.3 op/s
      UseAVX=2 (with 8-byte inlined load/stores): 209.3 op/s

            Unassigned Unassigned
            ogillespie Oli Gillespie
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: