Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Won't Fix
Priority: P4
Fix Version/s: None
Affects Version/s: 6
Component/s: core-libs
Labels:
- webbug

Subcomponent:
java.lang
CPU:

x86
OS:

windows_xp

A DESCRIPTION OF THE REQUEST :
An API to access SIMD instructions.

We need an API to access SIMD instructions, take advantage of hardware acceleration of vector-math. On the latest CPUs vector-math is factor 4 faster, which makes Java look inferiour. This can relatively easily be fixed, when we gain access to these CPU instructions indirectly.

java.lang.math.SIMD.add4(
                       float[] op1, int off1,
                       float[] op2, int off2,
                       float[] dst, int offDst
)

java.lang.math.SIMD.add4(
                       FloatBuffer op1, int off1,
                       FloatBuffer op2, int off2,
                       FloatBuffer dst, int offDst
)

default (bytecode) implementation of this method would be:
dst[offDst+0] = op1[off1+0] + op2[off2+0];
dst[offDst+1] = op1[off1+1] + op2[off2+1];
dst[offDst+2] = op1[off1+2] + op2[off2+2];
dst[offDst+3] = op1[off1+3] + op2[off2+3];

These methods are turned into instrincs at runtime (like sun.misc.Unsafe), using the vector-instructions of the current platform.

JUSTIFICATION :
With SIMD instructions one can do (theoreticly) 4 operations at a time. While most modern CPUs perform the SIMD instruction in 2+ cycles internally, this yields great performance improvements. In the latest (and upcoming) x86 CPUs, these operations are performed in 1 cycle internally.

The performance of the HotSpot JIT is ever increasing, but the gap between VM and native executable using SIMD, is widening. In vector-based code, or other mathematical SIMD-friendly algorithms, the performance can be multiplied by 200% - 400%, depending on the CPU's SIMD implementation.

__Making the JIT perform this optimisation behind the scenes is not sufficient__

  Programmers can invent smart(er) ways of dealing with data to make it SIMD-friendly / SIMD-optimal, while the JIT might overlook cases, or considers it too complex.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
float[] translate = new float[4];
float[] scale = new float[4];
float[] src = new float[vectors * 4];
float[] dst = new float[vectors * 4];

int end = vectors * 4;
for(int i=0; i<end; i+=4)
{
   SIMD.mul4(src, i, scale, 0, dst, i);
   SIMD.add4(src, i, translate, 0, dst, i);
}

relates to

JDK-6604786 SSE optimization for basic elementwise array operations

Closed

Assignee:: Joe Darcy

Reporter:: Nelson Dcosta (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2007-02-19 05:01

Updated:: 2011-02-16 11:15

Resolved:: 2007-02-28 21:00

Imported:: 15/Sep/12 11:27 PM

Indexed:: 17/Jul/12 7:45 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates