Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: P4
Fix Version/s: None
Affects Version/s: 16, 21
Component/s: hotspot
Labels:
- dcs-nr
- dcspks
- oracle-triage-21
- performance
- reproducer-yes
- vectorapi
- webbug

Subcomponent:
compiler
CPU:

x86_64
OS:

os_x

ADDITIONAL SYSTEM INFORMATION :
MacBook Pro / 6-Core Intel Core i9

MacOS Big Sur 11.2.2

openjdk version "16" 2021-03-16
OpenJDK Runtime Environment (build 16+36-2231)
OpenJDK 64-Bit Server VM (build 16+36-2231, mixed mode, sharing)

A DESCRIPTION OF THE PROBLEM :
Example code:

// BEGIN
  private static final VectorSpecies<Float> VFP = FloatVector.SPECIES_MAX;
  private static final VectorSpecies<Integer> VIP = IntVector.SPECIES_MAX;
  final static int STEP = VFP.length();

  static void updateGeGeneric(int angle, int d, int[] rowOffset, float[] regionY, int[] regionX0, int[] regionX1, int[] regionX, int count) {
    FloatVector mNyNx = FloatVector.broadcast(VFP, SinCos.MINUS_COT[angle]);
    FloatVector dNx = FloatVector.broadcast(VFP, (float)(d * SinCos.INV_SIN[angle] + 0.5f));
    IntVector k4 = IntVector.broadcast(VIP, 4);
    for (int i = 0; i < count; i += STEP) {
      FloatVector y = FloatVector.fromArray(VFP, regionY, i);
      IntVector offset = IntVector.fromArray(VIP, rowOffset, i);
      FloatVector xf = y.fma(mNyNx, dNx);
      // NEXT LINE IS SLOW
      IntVector xi = xf.convert(VectorOperators.F2I, 0).reinterpretAsInts();
      IntVector x0 = IntVector.fromArray(VIP, regionX0, i);
      IntVector x1 = IntVector.fromArray(VIP, regionX1, i);
      IntVector x = xi.max(x0).min(x1);
      IntVector xOff = x.add(offset).mul(k4);
      xOff.intoArray(regionX, i);
    }
  }
// END

Profiler shows that aforementioned conversion (jdk.incubator.vector.AbstractVector.convert(VectorOperators$Conversion, int)) consumes 99.2% of method time.
Overall, method performance is 4.85x slower than non-vectorized variant (or worse, depending on used vector species).

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run using "--add-modules=jdk.incubator.vector".

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Vectorized version is faster / same speed as regular one.
ACTUAL -
Vectorized version is 4x+ times slower

---------- BEGIN SOURCE ----------
git@github.com:eustas/2im.git
cd 2im
git checkout update-java
cd java
ant
echo "Baseline"
java -jar ./build/jar/twim.jar -e -r -t1024 `pwd`/beach.png
echo "Vectorized"
java --add-modules=jdk.incubator.vector -jar ./build/jar/twim.jar -e -r -t1024 `pwd`/beach.png
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Speculative: manually transform floats to ints by var-shift & masking (error-prone for too small / large values).

FREQUENCY : always

duplicates

JDK-8277793 Support vector F2I and D2L cast operations for X86

Resolved

JDK-8288043 Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms

Resolved

Assignee:: Paul Sandoz

Reporter:: Webbug Group

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2021-03-04 14:52

Updated:: 2023-03-26 22:01

Resolved:: 2023-03-26 22:01

Details

Description

Attachments

Issue Links

Activity

People

Dates