-
Enhancement
-
Resolution: Fixed
-
P4
-
21
-
b26
-
riscv
-
linux
We note that in the C2 AddReductionVF & AddReductionVD node, the src1 and dst registers are constrained to be the same register, which is not required, so we relax the register constraint for AddReductionVF/AddReductionVD in the C2 node. For reference, other CPUs, such as x86 and arm neon, do not need the same registers either[1]. arm64 sve constrains them to be the same registers because of the use of the FADDA instruction[2], which is floating point adding all active channels of SIMD&FP scalar sources and vector sources and placing the result in SIMD&FP scalar source registers. So for arm64 sve, it is required that that the two registers be the same.
[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2897-L2907
[2] https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/FADDA--Floating-point-add-strictly-ordered-reduction--accumulating-in-scalar-
[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2897-L2907
[2] https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/FADDA--Floating-point-add-strictly-ordered-reduction--accumulating-in-scalar-