-
Bug
-
Resolution: Duplicate
-
P4
-
None
-
16, 21
-
x86_64
-
os_x
ADDITIONAL SYSTEM INFORMATION :
MacBook Pro / 6-Core Intel Core i9
MacOS Big Sur 11.2.2
openjdk version "16" 2021-03-16
OpenJDK Runtime Environment (build 16+36-2231)
OpenJDK 64-Bit Server VM (build 16+36-2231, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
Example code:
// BEGIN
private static final VectorSpecies<Float> VFP = FloatVector.SPECIES_MAX;
private static final VectorSpecies<Integer> VIP = IntVector.SPECIES_MAX;
final static int STEP = VFP.length();
static void updateGeGeneric(int angle, int d, int[] rowOffset, float[] regionY, int[] regionX0, int[] regionX1, int[] regionX, int count) {
FloatVector mNyNx = FloatVector.broadcast(VFP, SinCos.MINUS_COT[angle]);
FloatVector dNx = FloatVector.broadcast(VFP, (float)(d * SinCos.INV_SIN[angle] + 0.5f));
IntVector k4 = IntVector.broadcast(VIP, 4);
for (int i = 0; i < count; i += STEP) {
FloatVector y = FloatVector.fromArray(VFP, regionY, i);
IntVector offset = IntVector.fromArray(VIP, rowOffset, i);
FloatVector xf = y.fma(mNyNx, dNx);
// NEXT LINE IS SLOW
IntVector xi = xf.convert(VectorOperators.F2I, 0).reinterpretAsInts();
IntVector x0 = IntVector.fromArray(VIP, regionX0, i);
IntVector x1 = IntVector.fromArray(VIP, regionX1, i);
IntVector x = xi.max(x0).min(x1);
IntVector xOff = x.add(offset).mul(k4);
xOff.intoArray(regionX, i);
}
}
// END
Profiler shows that aforementioned conversion (jdk.incubator.vector.AbstractVector.convert(VectorOperators$Conversion, int)) consumes 99.2% of method time.
Overall, method performance is 4.85x slower than non-vectorized variant (or worse, depending on used vector species).
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run using "--add-modules=jdk.incubator.vector".
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Vectorized version is faster / same speed as regular one.
ACTUAL -
Vectorized version is 4x+ times slower
---------- BEGIN SOURCE ----------
git@github.com:eustas/2im.git
cd 2im
git checkout update-java
cd java
ant
echo "Baseline"
java -jar ./build/jar/twim.jar -e -r -t1024 `pwd`/beach.png
echo "Vectorized"
java --add-modules=jdk.incubator.vector -jar ./build/jar/twim.jar -e -r -t1024 `pwd`/beach.png
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Speculative: manually transform floats to ints by var-shift & masking (error-prone for too small / large values).
FREQUENCY : always
MacBook Pro / 6-Core Intel Core i9
MacOS Big Sur 11.2.2
openjdk version "16" 2021-03-16
OpenJDK Runtime Environment (build 16+36-2231)
OpenJDK 64-Bit Server VM (build 16+36-2231, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
Example code:
// BEGIN
private static final VectorSpecies<Float> VFP = FloatVector.SPECIES_MAX;
private static final VectorSpecies<Integer> VIP = IntVector.SPECIES_MAX;
final static int STEP = VFP.length();
static void updateGeGeneric(int angle, int d, int[] rowOffset, float[] regionY, int[] regionX0, int[] regionX1, int[] regionX, int count) {
FloatVector mNyNx = FloatVector.broadcast(VFP, SinCos.MINUS_COT[angle]);
FloatVector dNx = FloatVector.broadcast(VFP, (float)(d * SinCos.INV_SIN[angle] + 0.5f));
IntVector k4 = IntVector.broadcast(VIP, 4);
for (int i = 0; i < count; i += STEP) {
FloatVector y = FloatVector.fromArray(VFP, regionY, i);
IntVector offset = IntVector.fromArray(VIP, rowOffset, i);
FloatVector xf = y.fma(mNyNx, dNx);
// NEXT LINE IS SLOW
IntVector xi = xf.convert(VectorOperators.F2I, 0).reinterpretAsInts();
IntVector x0 = IntVector.fromArray(VIP, regionX0, i);
IntVector x1 = IntVector.fromArray(VIP, regionX1, i);
IntVector x = xi.max(x0).min(x1);
IntVector xOff = x.add(offset).mul(k4);
xOff.intoArray(regionX, i);
}
}
// END
Profiler shows that aforementioned conversion (jdk.incubator.vector.AbstractVector.convert(VectorOperators$Conversion, int)) consumes 99.2% of method time.
Overall, method performance is 4.85x slower than non-vectorized variant (or worse, depending on used vector species).
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run using "--add-modules=jdk.incubator.vector".
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Vectorized version is faster / same speed as regular one.
ACTUAL -
Vectorized version is 4x+ times slower
---------- BEGIN SOURCE ----------
git@github.com:eustas/2im.git
cd 2im
git checkout update-java
cd java
ant
echo "Baseline"
java -jar ./build/jar/twim.jar -e -r -t1024 `pwd`/beach.png
echo "Vectorized"
java --add-modules=jdk.incubator.vector -jar ./build/jar/twim.jar -e -r -t1024 `pwd`/beach.png
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Speculative: manually transform floats to ints by var-shift & masking (error-prone for too small / large values).
FREQUENCY : always
- duplicates
-
JDK-8277793 Support vector F2I and D2L cast operations for X86
- Resolved
-
JDK-8288043 Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms
- Resolved