Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 21
Affects Version/s: 21
Component/s: hotspot
Labels:
- c2-superword
- performance

Subcomponent:
compiler
Resolved In Build:
b24

Currently, we do all reductions inside the loop. This makes sense for floating-point Add and Mul, where the order of reduction must be strictly linear, so as not to violate IEEE specification (basically the rounding would be ever so slightly different, and lead to wrong results).

Pseudocode:

acc = init
For (i ...) {
   vec = "some vector ops"; // vec holds vector of results from this iteration
   vector_reduction(vec, acc); // reduces vector vec into scalar accumulator acc
}
// use acc

However, in integer-reductions, and some floating-point reductions that do not require the linear order (Min / Max), we can do better. We can use a vector-accumulator in the loop, and do the reduction on this vector only after the loop. This should significantly reduce the work per loop iteration.

v_acc = scalar_to_vector(init); // depends on reduction op how we would do this
For (i ...) {
   vec = "some vector ops"; // vec holds vector of results from this iteration
   v_acc = vector_elememt_wise_reduction(v_acc, vec);
}
acc = vector_reduction(v_acc);
// use acc

Note: we already have different reduction implementations.
We already do a "recursive folding" for ints (C2_MacroAssembler::reduce8I), and a "linear folding" for floats (C2_MacroAssembler::reduce8F).
https://github.com/openjdk/jdk/blob/db1b48ef3bb4f8f0fbb6879200c0655b7fe006eb/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1895-L1941
https://github.com/openjdk/jdk/blob/db1b48ef3bb4f8f0fbb6879200c0655b7fe006eb/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L2096-L2120

I found this while working on ~~JDK-8302139~~, where I implemented an IR test for SuperWord reductions, and checked out the generated code.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

int-red-0-before.png
2023-03-14 05:01
65 kB
Emanuel Peter
int-red-1-after.png
2023-03-14 05:01
43 kB
Emanuel Peter
Test.java
2023-03-14 04:38
0.6 kB
Emanuel Peter

is blocked by

JDK-8302139 Speed up SuperWord reduction tests

Closed

relates to

JDK-8310130 C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction

Resolved

JDK-8302662 [SuperWord] Vectorize loop when value from last iteration is used after loop

Open

JDK-8307516 C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction

Open

JDK-8309647 [Vector API] Move Reduction outside loop when possible

Open

JDK-8345245 C2 SuperWord: further improve latency after PhaseIdealLoop::move_unordered_reduction_out_of_loop

Open

JDK-8307513 C2: intrinsify Math.max(long,long) and Math.min(long,long)

Resolved

JDK-8314612 TestUnorderedReduction.java fails with -XX:MaxVectorSize=32 and -XX:+AlignVector

Resolved

links to

Commit openjdk/jdk/06b0a5e0

Review openjdk/jdk/13056

(3 relates to, 2 links to)

Assignee:: Emanuel Peter

Reporter:: Emanuel Peter

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023-02-16 04:19

Updated:: 2024-11-29 03:13

Resolved:: 2023-05-23 01:07

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates