Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8298244

AArch64: Optimize vector implementation of AddReduction for floating point

    XMLWordPrintable

Details

    • b03
    • aarch64

    Description

      Certain associative operations that apply to floating point vectors are not truly associative on the floating point lane values. Specifically, ADD and MUL used with cross-lane reduction operations, such as FloatVector.reduceLanes(Associative). The result of such an operation is a function both of the input values (vector and mask) as well as the order of the scalar operations applied to combine lane values. In such cases the order is intentionally not defined. This allows the JVM to generate optimal machine code for the underlying platform at runtime. If the platform supports a vector instruction to add or multiply all values in the vector, or if there is some other efficient machine code sequence, then the JVM has the option of generating this machine code. Otherwise, the default implementation is applied, which adds vector elements sequentially from beginning to end. For this reason, the result of such an operation may vary for the same input values. See https://docs.oracle.com/en/java/javase/19/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#fp_assoc.

      Aarch64 neon platform doesn't support floating-point add reduction for auto-vectorization but only supports vector API. So, we can make use of some pairwise vector instructions to optimize vector implementation of AddReduction for floating point.

      Attachments

        Issue Links

          Activity

            People

              fgao Fei Gao
              fgao Fei Gao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: