-
Enhancement
-
Resolution: Unresolved
-
P4
-
26
-
x86_64
-
generic
Given that the auto-vectorized generated vector reduction operations are strictly ordered, there may not be any tangible performance uplift, while the vector API is relaxed in terms of precision requirements, which allows application of D&C to build a reduction tree.
- relates to
-
JDK-8351488 Improve performance of floating point reduction kernels
-
- Open
-
-
JDK-8365967 C2 compiler support for HalffloatVector operations supported by auto-vectorization flow
-
- Open
-
-
JDK-8366444 Add support for add/mul reduction operations for Float16
-
- Open
-