Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 21
Affects Version/s: 20
Component/s: hotspot
Labels:
- c2
- performance
- vectorapi

Subcomponent:
compiler
Resolved In Build:
b03
CPU:

aarch64

Certain associative operations that apply to floating point vectors are not truly associative on the floating point lane values. Specifically, ADD and MUL used with cross-lane reduction operations, such as FloatVector.reduceLanes(Associative). The result of such an operation is a function both of the input values (vector and mask) as well as the order of the scalar operations applied to combine lane values. In such cases the order is intentionally not defined. This allows the JVM to generate optimal machine code for the underlying platform at runtime. If the platform supports a vector instruction to add or multiply all values in the vector, or if there is some other efficient machine code sequence, then the JVM has the option of generating this machine code. Otherwise, the default implementation is applied, which adds vector elements sequentially from beginning to end. For this reason, the result of such an operation may vary for the same input values. See https://docs.oracle.com/en/java/javase/19/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#fp_assoc.

Aarch64 neon platform doesn't support floating-point add reduction for auto-vectorization but only supports vector API. So, we can make use of some pairwise vector instructions to optimize vector implementation of AddReduction for floating point.

links to

Commit openjdk/jdk/ba942c24

Review openjdk/jdk/11663

Assignee:: Fei Gao

Reporter:: Fei Gao

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2022-12-06 22:28

Updated:: 2023-06-20 07:18

Resolved:: 2022-12-18 17:13

Details

Description

Attachments

Issue Links

Activity

People

Dates