Details

Enhancement

Resolution: Fixed

P4

21

b24
Description
Pseudocode:
acc = init
For (i ...) {
vec = "some vector ops"; // vec holds vector of results from this iteration
vector_reduction(vec, acc); // reduces vector vec into scalar accumulator acc
}
// use acc
However, in integerreductions, and some floatingpoint reductions that do not require the linear order (Min / Max), we can do better. We can use a vectoraccumulator in the loop, and do the reduction on this vector only after the loop. This should significantly reduce the work per loop iteration.
v_acc = scalar_to_vector(init); // depends on reduction op how we would do this
For (i ...) {
vec = "some vector ops"; // vec holds vector of results from this iteration
v_acc = vector_elememt_wise_reduction(v_acc, vec);
}
acc = vector_reduction(v_acc);
// use acc
Note: we already have different reduction implementations.
We already do a "recursive folding" for ints (C2_MacroAssembler::reduce8I), and a "linear folding" for floats (C2_MacroAssembler::reduce8F).
https://github.com/openjdk/jdk/blob/db1b48ef3bb4f8f0fbb6879200c0655b7fe006eb/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1895L1941
https://github.com/openjdk/jdk/blob/db1b48ef3bb4f8f0fbb6879200c0655b7fe006eb/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L2096L2120
I found this while working on JDK8302139, where I implemented an IR test for SuperWord reductions, and checked out the generated code.
Attachments
Issue Links
 is blocked by

JDK8302139 Speed up SuperWord reduction tests
 In Progress
 relates to

JDK8310130 C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction
 Resolved

JDK8302662 [SuperWord] Vectorize loop when value from last iteration is used after loop
 Open

JDK8307513 C2: intrinsify Math.max(long,long) and Math.min(long,long)
 Open

JDK8307516 C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction
 Open

JDK8309647 [Vector API] Move Reduction outside loop when possible
 Open

JDK8314612 TestUnorderedReduction.java fails with XX:MaxVectorSize=32 and XX:+AlignVector
 Resolved