-
Enhancement
-
Resolution: Fixed
-
P4
-
9, 9.0.1
-
b36
-
x86_64
-
linux
FULL PRODUCT VERSION :
java version "9.0.1"
Java(TM) SE Runtime Environment (build 9.0.1+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
FULL OS VERSION :
Linux pnod0337 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
EXTRA RELEVANT SYSTEM CONFIGURATION :
Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
A DESCRIPTION OF THE PROBLEM :
Compiled (c2) method output is smart enough to detect that it needs to use float version of the instruction but it does not vectorize the loop in this case (see the code)
When running the code as:
java -XX:UseAVX=3 -XX:CompileCommand=print,Sqrt.* Sqrt>asm_sqrt.txt
sqrtDouble is correctly vectorized and uses ZMM registers:
0x00007fc311924889: vsqrtpd 0x50(%rbx,%r10,8),%zmm0{%k1}{z}
0x00007fc311924894: vmovdqu64 %zmm0,0x50(%rbx,%r10,8){%k1}
However sqrtFloat does not get vectorized and uses scalar version of the instruction:
0x00007fc311925a20: vsqrtss 0x10(%rdx,%r8,4),%xmm1,%xmm1{%k1}{z}
0x00007fc311925a28: vmovss %xmm1,0x10(%rdx,%r8,4){%k1}
THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try
THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Did not try
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile the attached source code with :
java -XX:UseAVX=3 -XX:CompileCommand=print,Sqrt.* Sqrt>asm_sqrt.txt
Observe the assembly for (C2) compiled methods sqrtFloat and sqrtDouble.
EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected results would be to use vsqrtps instruction in vectorized loop.
Actual is that vsqrtss is used instead and the loop is not vectorized.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
class Sqrt {
private void sqrtDouble(final double[] samples) {
for (int i = 0; i < samples.length; i++) {
samples[i] = Math.sqrt(samples[i]);
}
}
private void sqrtFloat(final float[] samples) {
for (int i = 0; i < samples.length; i++) {
samples[i] = (float)Math.sqrt(samples[i]);
}
}
public static void main(String[] argv) throws Exception {
float samples[] = new float[4000];
double samplesd[] = new double[4000];
for (int i=0;i<samples.length;i++){
samples[i] = i;
samplesd[i] = i;
}
Sqrt sqrt = new Sqrt();
for (int i=0;i<10000;i++){
sqrt.sqrtFloat(samples);
sqrt.sqrtDouble(samplesd);
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Use slow version.
java version "9.0.1"
Java(TM) SE Runtime Environment (build 9.0.1+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
FULL OS VERSION :
Linux pnod0337 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
EXTRA RELEVANT SYSTEM CONFIGURATION :
Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
A DESCRIPTION OF THE PROBLEM :
Compiled (c2) method output is smart enough to detect that it needs to use float version of the instruction but it does not vectorize the loop in this case (see the code)
When running the code as:
java -XX:UseAVX=3 -XX:CompileCommand=print,Sqrt.* Sqrt>asm_sqrt.txt
sqrtDouble is correctly vectorized and uses ZMM registers:
0x00007fc311924889: vsqrtpd 0x50(%rbx,%r10,8),%zmm0{%k1}{z}
0x00007fc311924894: vmovdqu64 %zmm0,0x50(%rbx,%r10,8){%k1}
However sqrtFloat does not get vectorized and uses scalar version of the instruction:
0x00007fc311925a20: vsqrtss 0x10(%rdx,%r8,4),%xmm1,%xmm1{%k1}{z}
0x00007fc311925a28: vmovss %xmm1,0x10(%rdx,%r8,4){%k1}
THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try
THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Did not try
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile the attached source code with :
java -XX:UseAVX=3 -XX:CompileCommand=print,Sqrt.* Sqrt>asm_sqrt.txt
Observe the assembly for (C2) compiled methods sqrtFloat and sqrtDouble.
EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected results would be to use vsqrtps instruction in vectorized loop.
Actual is that vsqrtss is used instead and the loop is not vectorized.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
class Sqrt {
private void sqrtDouble(final double[] samples) {
for (int i = 0; i < samples.length; i++) {
samples[i] = Math.sqrt(samples[i]);
}
}
private void sqrtFloat(final float[] samples) {
for (int i = 0; i < samples.length; i++) {
samples[i] = (float)Math.sqrt(samples[i]);
}
}
public static void main(String[] argv) throws Exception {
float samples[] = new float[4000];
double samplesd[] = new double[4000];
for (int i=0;i<samples.length;i++){
samples[i] = i;
samplesd[i] = i;
}
Sqrt sqrt = new Sqrt();
for (int i=0;i<10000;i++){
sqrt.sqrtFloat(samples);
sqrt.sqrtDouble(samplesd);
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Use slow version.
- relates to
-
JDK-8243155 AArch64: Add support for SqrtVF
-
- Resolved
-
-
JDK-8308277 RISC-V: Improve vectorization of Match.sqrt() on floats
-
- Resolved
-
-
JDK-8202179 Compilation fails with assert(n->is_expensive()) failed: expensive nodes with non-null control here only
-
- Resolved
-