Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6604786

SSE optimization for basic elementwise array operations

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P5 P5
    • 9
    • 7
    • hotspot
    • x86
    • windows_xp

      A DESCRIPTION OF THE REQUEST :
      Some image processing and other algorithms are based on loops, that apply a simple operation for all elements of 2 or several arrays. For example, elementwise unsigned maximum for byte arrays is the base of morphological image filters:

          for (int srcPosMax = srcPos + count; srcPos < srcPosMax; srcPos++, destPos++) {
              if ((src[srcPos] & 0xFF) > (dest[destPos] & 0xFF))
                  dest[destPos] = src[srcPos];
          }

      Elementwise sum for int or float arrays lies in the base of linear and other filters:

              for (int srcPosMax = srcPos + count; srcPos < srcPosMax; srcPos++, destPos++) {
                  dest[destPos] += src[srcPos];
              }

      All Intel processors since Pentium II offer special commands allowing to greatly optimize such loops, namely, SSE (MMX in first processors). In SSE2, we may calculate minimum, maximum, saturated or usual sum or difference and some other operations for 8 bytes / 4 shorts / 2 ints or floats in one command. It increases performance in times in comparison with simple loop.

      Unfortunately, Java does not use this optimization. I think it is a good idea if the HotSpot compiler will "understand" the loops alike listed above and translate them into native SSE commands for Intel processors. The loops, used for elementwise array processing, are usually very simple and can be easily recognized; so, I think that necessary correction of HotSpot optimizer should not be too complex.

      Or, as a variant, maybe you'll implement the set of typical elementwise array operation (according to the set of SSE commands) in your own native methods in the standard Math or similar class? The great advantage of such solution would be supporting not only Java arrays, but also direct XxxBuffer (ByteBuffer, ShortBuffer, etc.) In current JVM, the simple implementations based on get/set method sometimes works very slow even in "-server" mode:

              for (int srcPosMax = srcPos + count; srcPos < srcPosMax; srcPos++, destPos++) {
                  dest.put(dest.get(destPos) + src.get(srcPos));
              }


      JUSTIFICATION :
      Impossibility to use advantages of SSE commands in simple elementwise loops without hard programming of native methods in the applications for all OS supporting Intel CPU.


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      More intellectual HotSpot optimizer, that recognizes elementwise loops with several standard operations (at least minimum, maximum, sum, difference, saturated sum and difference) and performs them via SSE, grouping iterations per 8 bytes and processing several first and last bytes in usual way (to provide good alignment). Or, as a variant, the ready set of all necessary native methods, processing Java arrays and NIO buffers of all primitive types, in Math or similar class.

      CUSTOMER SUBMITTED WORKAROUND :
      Creating the native methods for most serious image- and video-processing applications.

            Unassigned Unassigned
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: