Currently the AArch64 implementation of vector rearrange is not completed for vector types with lane count smaller than 4 (see [1]). This makes some benchmarks with Long/Double vector types have large performance gap on NVIDIA Grace, the sve2 architecture with 128-bit vector size, with other SVE and X86 machines.
Vector rearrange relays on a vector shuffle input, which the payload is a byte array previously. Since the supported min vector lane count for byte is 4 on AArch64, the lane count limitation is also added for rearrange. But considering the payload of vector shuffle has been changed to the data type of each vector (i.e. `int` for `IntVector`) recently [2], we can remove this lane count limitation for rearrange.
[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L209
[2] https://bugs.openjdk.org/browse/JDK-8310691
Vector rearrange relays on a vector shuffle input, which the payload is a byte array previously. Since the supported min vector lane count for byte is 4 on AArch64, the lane count limitation is also added for rearrange. But considering the payload of vector shuffle has been changed to the data type of each vector (i.e. `int` for `IntVector`) recently [2], we can remove this lane count limitation for rearrange.
[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L209
[2] https://bugs.openjdk.org/browse/JDK-8310691
- links to
-
Commit(master) openjdk/jdk/99c8a6e4
-
Review(master) openjdk/jdk/23790