-
Enhancement
-
Resolution: Fixed
-
P4
-
9, 10
-
b36
-
generic
-
generic
This is reported by yang.zhang@linaro.org:
In OpenJDK 9/10 hotspot c2, for the test case:
=================
public static void mulCInt( int[] a, int[] b, int[] c, int loop) {
for (int i = 0; i < loop; i++) {
int t0 = a[i] * 5;
int t1 = b[i] * 10;
c[i] = t0 + t1;
}
}
=================
This should be vectorized. At first, C2 optimizes such multiplications to shift and add. Then vectorization is done. But vectorization fails both on aarch64 and x86 platform. The log that SLP_Extract() returns is "Unprofitable".
Unprofitable
465 LShiftI === _ 466 155 [[ 442 ]]
Unprofitable
461 AddI === _ 462 463 [[ 441 ]]
Unprofitable
442 AddI === _ 443 465 [[ 441 ]]
Unprofitable
443 LShiftI === _ 466 42 [[ 442 ]]
Unprofitable
462 LShiftI === _ 463 110 [[ 461 ]]
Unprofitable
441 AddI === _ 442 461 [[ 440 ]]
Unprofitable
466 LoadI === 245 487 467 [[ 443 465 ]]
Unprofitable
463 LoadI === 229 487 464 [[ 461 462 ]]
Unprofitable
440 StoreI === 479 487 460 441 [[ 431 436 439 ]]
After test, changing constant (5, 10) to (9, 17) also results in same failure.
Command for generating assembly and log:
-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+PrintCompilation -XX:+PrintAssembly -XX:+TraceLoopOpts -XX:+TraceSuperWord -XX:+TraceNewVectors
In OpenJDK 9/10 hotspot c2, for the test case:
=================
public static void mulCInt( int[] a, int[] b, int[] c, int loop) {
for (int i = 0; i < loop; i++) {
int t0 = a[i] * 5;
int t1 = b[i] * 10;
c[i] = t0 + t1;
}
}
=================
This should be vectorized. At first, C2 optimizes such multiplications to shift and add. Then vectorization is done. But vectorization fails both on aarch64 and x86 platform. The log that SLP_Extract() returns is "Unprofitable".
Unprofitable
465 LShiftI === _ 466 155 [[ 442 ]]
Unprofitable
461 AddI === _ 462 463 [[ 441 ]]
Unprofitable
442 AddI === _ 443 465 [[ 441 ]]
Unprofitable
443 LShiftI === _ 466 42 [[ 442 ]]
Unprofitable
462 LShiftI === _ 463 110 [[ 461 ]]
Unprofitable
441 AddI === _ 442 461 [[ 440 ]]
Unprofitable
466 LoadI === 245 487 467 [[ 443 465 ]]
Unprofitable
463 LoadI === 229 487 464 [[ 461 462 ]]
Unprofitable
440 StoreI === 479 487 460 441 [[ 431 436 439 ]]
After test, changing constant (5, 10) to (9, 17) also results in same failure.
Command for generating assembly and log:
-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+PrintCompilation -XX:+PrintAssembly -XX:+TraceLoopOpts -XX:+TraceSuperWord -XX:+TraceNewVectors