-
Enhancement
-
Resolution: Unresolved
-
P4
-
24
This will improve the profitability of vectorizing reductions, and adding shuffle/pack/unpack operations.
Because vectorization is not always profitable, especially if we add more operations to the loop.
There may also be extra cost to subword conversion, see:
https://github.com/openjdk/jdk/pull/23413
--------------------------- PLAN ----------------------
I have a proof-of-concept patch here:
https://github.com/openjdk/jdk/pull/20964
Instead of pushing it as a whole (quite unreviewable), I'll split it up into subtasks.
Here a rough schedule towards Cost-Modeling:
0. Smaller refactorings
1. Scalar node refactoring
- Finer resolution: mem, phi, data, cfg
- These will be needed when modeling the whole loop instead of just the basic block (step 3)
2. Vector node refactoring
- remove reliance on _nodes , so that it will be easier to model the whole loop (step 3)
- instead capture all relevant information in some sort of VTransformNodePrototpye : opcode, vlen, basic_type, etc.
3. Model whole loop instead of only basic block (allows VTransform optimizations like moving reduction out of loop)
- Instead of VTransformGraph::apply_memops_reordering_with_schedule that reorders the old graph,
- I want to build the new loop body from the VTransform directly.
- That means we are less constrained by the old shape of the loop.
4. Optimize: e.g. move reduction out of loop
- Refactor move_unordered_reduction_out_of_loop
- Moving the reduction out of the loop will mean it is not counted in the cost any more, and it is now more profitable (see step 5)
5. Cost-model
- count scalar loop cost (via scalar opcodes)
- count vector loop cost (via scalar opcodes, and vector opcodes + vlen)
- keep track of live nodes (optimization might kill some)
- keep track of nodes inside loop (optimizations might float some nodes out of the loop, don't count their cost)
A later task could be to also do:
VTransformLongToIntVectorNode::optimize
Because vectorization is not always profitable, especially if we add more operations to the loop.
There may also be extra cost to subword conversion, see:
https://github.com/openjdk/jdk/pull/23413
--------------------------- PLAN ----------------------
I have a proof-of-concept patch here:
https://github.com/openjdk/jdk/pull/20964
Instead of pushing it as a whole (quite unreviewable), I'll split it up into subtasks.
Here a rough schedule towards Cost-Modeling:
0. Smaller refactorings
1. Scalar node refactoring
- Finer resolution: mem, phi, data, cfg
- These will be needed when modeling the whole loop instead of just the basic block (step 3)
2. Vector node refactoring
- remove reliance on _nodes , so that it will be easier to model the whole loop (step 3)
- instead capture all relevant information in some sort of VTransformNodePrototpye : opcode, vlen, basic_type, etc.
3. Model whole loop instead of only basic block (allows VTransform optimizations like moving reduction out of loop)
- Instead of VTransformGraph::apply_memops_reordering_with_schedule that reorders the old graph,
- I want to build the new loop body from the VTransform directly.
- That means we are less constrained by the old shape of the loop.
4. Optimize: e.g. move reduction out of loop
- Refactor move_unordered_reduction_out_of_loop
- Moving the reduction out of the loop will mean it is not counted in the cost any more, and it is now more profitable (see step 5)
5. Cost-model
- count scalar loop cost (via scalar opcodes)
- count vector loop cost (via scalar opcodes, and vector opcodes + vlen)
- keep track of live nodes (optimization might kill some)
- keep track of nodes inside loop (optimizations might float some nodes out of the loop, don't count their cost)
A later task could be to also do:
VTransformLongToIntVectorNode::optimize
- blocks
-
JDK-8347116 C2 SuperWord: If-Conversion
-
- Open
-
- is blocked by
-
JDK-8349139 C2: Div looses dependency on condition that guarantees divisor not zero in counted loop
-
- Resolved
-
- relates to
-
JDK-8336000 C2 SuperWord: report that 2-element reductions do not vectorize
-
- Open
-
-
JDK-8307516 C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction
-
- Open
-
-
JDK-8357530 C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability
-
- Resolved
-
There are no Sub-Tasks for this issue.