-
Enhancement
-
Resolution: Unresolved
-
P4
-
24
This will improve the profitability of vectorizing reductions, and adding shuffle/pack/unpack operations.
Because vectorization is not always profitable, especially if we add more operations to the loop.
There may also be extra cost to subword conversion, see:
https://github.com/openjdk/jdk/pull/23413
--------------------------- PLAN ----------------------
I have a proof-of-concept patch here:
https://github.com/openjdk/jdk/pull/20964
Instead of pushing it as a whole (quite unreviewable), I'll split it up into subtasks.
Here a rough schedule towards Cost-Modeling:
0. Smaller refactorings
1. Scalar node refactoring
- Finer resolution: mem, phi, data, cfg
- These will be needed when modeling the whole loop instead of just the basic block (step 3)
2. Vector node refactoring
- remove reliance on _nodes , so that it will be easier to model the whole loop (step 3)
- instead capture all relevant information in some sort of VTransformNodePrototpye : opcode, vlen, basic_type, etc.
3. Model whole loop instead of only basic block (allows VTransform optimizations like moving reduction out of loop)
- Instead of VTransformGraph::apply_memops_reordering_with_schedule that reorders the old graph,
- I want to build the new loop body from the VTransform directly.
- That means we are less constrained by the old shape of the loop.
4. Optimize: e.g. move reduction out of loop
- Refactor move_unordered_reduction_out_of_loop
- Moving the reduction out of the loop will mean it is not counted in the cost any more, and it is now more profitable (see step 5)
5. Cost-model
- count scalar loop cost (via scalar opcodes)
- count vector loop cost (via scalar opcodes, and vector opcodes + vlen)
- keep track of live nodes (optimization might kill some)
- keep track of nodes inside loop (optimizations might float some nodes out of the loop, don't count their cost)
Because vectorization is not always profitable, especially if we add more operations to the loop.
There may also be extra cost to subword conversion, see:
https://github.com/openjdk/jdk/pull/23413
--------------------------- PLAN ----------------------
I have a proof-of-concept patch here:
https://github.com/openjdk/jdk/pull/20964
Instead of pushing it as a whole (quite unreviewable), I'll split it up into subtasks.
Here a rough schedule towards Cost-Modeling:
0. Smaller refactorings
1. Scalar node refactoring
- Finer resolution: mem, phi, data, cfg
- These will be needed when modeling the whole loop instead of just the basic block (step 3)
2. Vector node refactoring
- remove reliance on _nodes , so that it will be easier to model the whole loop (step 3)
- instead capture all relevant information in some sort of VTransformNodePrototpye : opcode, vlen, basic_type, etc.
3. Model whole loop instead of only basic block (allows VTransform optimizations like moving reduction out of loop)
- Instead of VTransformGraph::apply_memops_reordering_with_schedule that reorders the old graph,
- I want to build the new loop body from the VTransform directly.
- That means we are less constrained by the old shape of the loop.
4. Optimize: e.g. move reduction out of loop
- Refactor move_unordered_reduction_out_of_loop
- Moving the reduction out of the loop will mean it is not counted in the cost any more, and it is now more profitable (see step 5)
5. Cost-model
- count scalar loop cost (via scalar opcodes)
- count vector loop cost (via scalar opcodes, and vector opcodes + vlen)
- keep track of live nodes (optimization might kill some)
- keep track of nodes inside loop (optimizations might float some nodes out of the loop, don't count their cost)
- blocks
-
JDK-8347116 C2 SuperWord: If-Conversion
-
- Open
-
- is blocked by
-
JDK-8349139 C2: Div looses dependency on condition that guarantees divisor not zero in counted loop
-
- Resolved
-
- relates to
-
JDK-8336000 C2 SuperWord: report that 2-element reductions do not vectorize
-
- Open
-
-
JDK-8307516 C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction
-
- Open
-
-
JDK-8357530 C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability
-
- Resolved
-