Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6526380

Add API to access SIMD instructions

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Won't Fix
    • Icon: P4 P4
    • None
    • 6
    • core-libs
    • x86
    • windows_xp

      A DESCRIPTION OF THE REQUEST :
      An API to access SIMD instructions.

      We need an API to access SIMD instructions, take advantage of hardware acceleration of vector-math. On the latest CPUs vector-math is factor 4 faster, which makes Java look inferiour. This can relatively easily be fixed, when we gain access to these CPU instructions indirectly.

      java.lang.math.SIMD.add4(
                             float[] op1, int off1,
                             float[] op2, int off2,
                             float[] dst, int offDst
      )

      java.lang.math.SIMD.add4(
                             FloatBuffer op1, int off1,
                             FloatBuffer op2, int off2,
                             FloatBuffer dst, int offDst
      )


      default (bytecode) implementation of this method would be:
      dst[offDst+0] = op1[off1+0] + op2[off2+0];
      dst[offDst+1] = op1[off1+1] + op2[off2+1];
      dst[offDst+2] = op1[off1+2] + op2[off2+2];
      dst[offDst+3] = op1[off1+3] + op2[off2+3];

      These methods are turned into instrincs at runtime (like sun.misc.Unsafe), using the vector-instructions of the current platform.

      JUSTIFICATION :
      With SIMD instructions one can do (theoreticly) 4 operations at a time. While most modern CPUs perform the SIMD instruction in 2+ cycles internally, this yields great performance improvements. In the latest (and upcoming) x86 CPUs, these operations are performed in 1 cycle internally.

      The performance of the HotSpot JIT is ever increasing, but the gap between VM and native executable using SIMD, is widening. In vector-based code, or other mathematical SIMD-friendly algorithms, the performance can be multiplied by 200% - 400%, depending on the CPU's SIMD implementation.

      __Making the JIT perform this optimisation behind the scenes is not sufficient__

        Programmers can invent smart(er) ways of dealing with data to make it SIMD-friendly / SIMD-optimal, while the JIT might overlook cases, or considers it too complex.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      float[] translate = new float[4];
      float[] scale = new float[4];
      float[] src = new float[vectors * 4];
      float[] dst = new float[vectors * 4];

      int end = vectors * 4;
      for(int i=0; i<end; i+=4)
      {
         SIMD.mul4(src, i, scale, 0, dst, i);
         SIMD.add4(src, i, translate, 0, dst, i);
      }

            darcy Joe Darcy
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: