-
Enhancement
-
Resolution: Fixed
-
P4
-
21
-
b24
Pseudocode:
acc = init
For (i ...) {
vec = "some vector ops"; // vec holds vector of results from this iteration
vector_reduction(vec, acc); // reduces vector vec into scalar accumulator acc
}
// use acc
However, in integer-reductions, and some floating-point reductions that do not require the linear order (Min / Max), we can do better. We can use a vector-accumulator in the loop, and do the reduction on this vector only after the loop. This should significantly reduce the work per loop iteration.
v_acc = scalar_to_vector(init); // depends on reduction op how we would do this
For (i ...) {
vec = "some vector ops"; // vec holds vector of results from this iteration
v_acc = vector_elememt_wise_reduction(v_acc, vec);
}
acc = vector_reduction(v_acc);
// use acc
Note: we already have different reduction implementations.
We already do a "recursive folding" for ints (C2_MacroAssembler::reduce8I), and a "linear folding" for floats (C2_MacroAssembler::reduce8F).
https://github.com/openjdk/jdk/blob/db1b48ef3bb4f8f0fbb6879200c0655b7fe006eb/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1895-L1941
https://github.com/openjdk/jdk/blob/db1b48ef3bb4f8f0fbb6879200c0655b7fe006eb/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L2096-L2120
I found this while working on JDK-8302139, where I implemented an IR test for SuperWord reductions, and checked out the generated code.
- is blocked by
-
JDK-8302139 Speed up SuperWord reduction tests
- In Progress
- relates to
-
JDK-8310130 C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction
- Resolved
-
JDK-8302662 [SuperWord] Vectorize loop when value from last iteration is used after loop
- Open
-
JDK-8307513 C2: intrinsify Math.max(long,long) and Math.min(long,long)
- Open
-
JDK-8307516 C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction
- Open
-
JDK-8309647 [Vector API] Move Reduction outside loop when possible
- Open
-
JDK-8345245 C2 SuperWord: further improve latency after PhaseIdealLoop::move_unordered_reduction_out_of_loop
- Open
-
JDK-8314612 TestUnorderedReduction.java fails with -XX:MaxVectorSize=32 and -XX:+AlignVector
- Resolved