-
Bug
-
Resolution: Fixed
-
P3
-
20
-
b27
-
generic
-
generic
1. Java API for Long.bitCount/numberOfTrailingZeros/numberOfLeadingZeros returns int type but Vector API for them returns long type. Currently, to support auto-vectorization and vector API at the same time, backend provides two kinds of vector implementation for them: one has int vector type and another one has long vector type, as discussed in https://github.com/openjdk/panama-vector/pull/185#discussion_r836017952.
We can refine the auto-vectorization of these APIs in superword to unify the vector implementation in the backend, removing extra code.
2. Also, Long.bitCount can't be vectorized when -XX:MaxVectorSize=16, causing the IR match failure of compiler/vectorization/TestPopCountVectorLong.java on 128-bit sve platform. The task also needs to fix it.
3. Now, `Long.NumberOfLeadingZeros/NumberOfTrailingZeros()` can be vectorized on sve platforms when `-XX:MaxVectorSize=32` or `-XX:MaxVectorSize=64` , the generated code is not correct, like:
```
LOOP:
sxtw x13, w12
add x14, x15, x13, uxtx #3
add x17, x14, #0x10
ld1d {z16.d}, p7/z, [x17]
// Incorrectly use integer rbit/clz insn for long type vector
*rbit z16.s, p7/m, z16.s
*clz z16.s, p7/m, z16.s
add x13, x16, x13, uxtx #2
str q16, [x13, #16]
...
add w12, w12, #0x20
cmp w12, w3
b.lt LOOP
```
4. On x86 avx2 platform, there is an assertion failure when C2 tries to vectorize the loops like:
```
// long[] ia;
// int[] ic;
for (int i = 0; i < LENGTH; ++i) {
ic[i] = Long.numberOfLeadingZeros(ia[i]);
}
```
We can refine the auto-vectorization of these APIs in superword to unify the vector implementation in the backend, removing extra code.
2. Also, Long.bitCount can't be vectorized when -XX:MaxVectorSize=16, causing the IR match failure of compiler/vectorization/TestPopCountVectorLong.java on 128-bit sve platform. The task also needs to fix it.
3. Now, `Long.NumberOfLeadingZeros/NumberOfTrailingZeros()` can be vectorized on sve platforms when `-XX:MaxVectorSize=32` or `-XX:MaxVectorSize=64` , the generated code is not correct, like:
```
LOOP:
sxtw x13, w12
add x14, x15, x13, uxtx #3
add x17, x14, #0x10
ld1d {z16.d}, p7/z, [x17]
// Incorrectly use integer rbit/clz insn for long type vector
*rbit z16.s, p7/m, z16.s
*clz z16.s, p7/m, z16.s
add x13, x16, x13, uxtx #2
str q16, [x13, #16]
...
add w12, w12, #0x20
cmp w12, w3
b.lt LOOP
```
4. On x86 avx2 platform, there is an assertion failure when C2 tries to vectorize the loops like:
```
// long[] ia;
// int[] ic;
for (int i = 0; i < LENGTH; ++i) {
ic[i] = Long.numberOfLeadingZeros(ia[i]);
}
```