Certain associative operations that apply to floating point vectors are not truly associative on the floating point lane values. Specifically, ADD and MUL used with cross-lane reduction operations, such as FloatVector.reduceLanes(Associative). The result of such an operation is a function both of the input values (vector and mask) as well as the order of the scalar operations applied to combine lane values. In such cases the order is intentionally not defined. This allows the JVM to generate optimal machine code for the underlying platform at runtime. If the platform supports a vector instruction to add or multiply all values in the vector, or if there is some other efficient machine code sequence, then the JVM has the option of generating this machine code. Otherwise, the default implementation is applied, which adds vector elements sequentially from beginning to end. For this reason, the result of such an operation may vary for the same input values. See https://docs.oracle.com/en/java/javase/19/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#fp_assoc.
Aarch64 neon platform doesn't support floating-point add reduction for auto-vectorization but only supports vector API. So, we can make use of some pairwise vector instructions to optimize vector implementation of AddReduction for floating point.
Aarch64 neon platform doesn't support floating-point add reduction for auto-vectorization but only supports vector API. So, we can make use of some pairwise vector instructions to optimize vector implementation of AddReduction for floating point.