Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 25
Affects Version/s: 24
Component/s: hotspot
Labels:
- aarch64
- c2
- performance
- vectorapi

Subcomponent:
compiler
Resolved In Build:
b16
CPU:

aarch64

Currently the AArch64 implementation of vector rearrange is not completed for vector types with lane count smaller than 4 (see [1]). This makes some benchmarks with Long/Double vector types have large performance gap on NVIDIA Grace, the sve2 architecture with 128-bit vector size, with other SVE and X86 machines.

Vector rearrange relays on a vector shuffle input, which the payload is a byte array previously. Since the supported min vector lane count for byte is 4 on AArch64, the lane count limitation is also added for rearrange. But considering the payload of vector shuffle has been changed to the data type of each vector (i.e. `int` for `IntVector`) recently [2], we can remove this lane count limitation for rearrange.

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L209
[2] https://bugs.openjdk.org/browse/JDK-8310691

links to

Commit(master) openjdk/jdk/99c8a6e4

Review(master) openjdk/jdk/23790

Assignee:: Xiaohong Gong

Reporter:: Xiaohong Gong

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025-02-20 17:46

Updated:: 2025-04-03 03:06

Resolved:: 2025-03-24 23:08

Details

Description

Attachments

Issue Links

Activity

People

Dates