Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks:
1, Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically.
2, Redundant expansions in rearrange operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the rearrange operations.
3, Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler.
4, Range checks are performed using toVector, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones.
As a result, I propose to implement VectorShuffle as an array of the bit type (a.k.a the integral type that has the same size as the element type).
1, Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically.
2, Redundant expansions in rearrange operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the rearrange operations.
3, Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler.
4, Range checks are performed using toVector, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones.
As a result, I propose to implement VectorShuffle as an array of the bit type (a.k.a the integral type that has the same size as the element type).
- clones
-
JDK-8304450 [vectorapi] Refactor VectorShuffle implementation
- Resolved
- duplicates
-
JDK-8309373 Performance drop in Vector-API based kernel with JDK-21
- Closed
- links to
-
Review(master) openjdk/jdk/21042