Loading...

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 21
Affects Version/s: 19
Component/s: hotspot
Labels:
- c2
- c2-superword

Subcomponent:
compiler
Resolved In Build:
b21

Currently, SLP reduction vectorization follows a two-step approach:

(1) identify and mark reduction nodes and their corresponding loops early (to inform the unrolling policy),
(2) reuse later this information (which nodes are marked as reductions) to guide SLP vectorization.

By performing (1) before loop unrolling, this approach can identify reductions in a simple and efficient manner. However, it has a serious drawback that makes it hard to maintain: many different high-level loop transformations and low-level Ideal transformations can invalidate the node and loop reduction marks (flags) within the loop optimization iterations between steps (1) and (2) above, leading in the worst case to SLP miscompilations based on inconsistent reduction information. ~~JDK-8261147~~ and ~~JDK-8279622~~ are actual miscompilations caused by this design problem.

A potential solution is to run step (1) on-demand: first to inform the unrolling policy, and then again to guide SLP vectorization. This would avoid this source of bugs by construction, because there would not be analysis information "on the side" that could be potentially invalidated. If a node found to be a reduction before unrolling stops being a reduction due to some loop or Ideal transformation, it will simply not be identified and treated as a reduction by SLP vectorization.

The solution suggested in this RFE requires generalizing reduction analysis to also identify reductions on unrolled loops. For example, the analysis should be able to detect that node (166 AddI) in reduction-before-unrolling.pdf (attached) is a reduction, and also that nodes {(166 AddI, 504 AddI, 579 AddI, 591 AddI)} form a reduction chain in the corresponding unrolled loop reduction-unrolled-x4.pdf. While detecting reduction chains in unrolled loops is necessarily more costly, the cost is never worse than ("number of phi nodes in the loop" x LoopMaxUnroll), under the assumption that all nodes in an unrolled reduction chain are connected via the same input number. This assumption is key to avoid a combinatorial explosion, and seems to hold in practice, since reduction chains are formed by cloning the same reduction node a number of times.

A prototype of this alternative design is available at https://github.com/robcasloz/jdk/tree/JDK-8287087.

Work left to do:

- Evaluate the overhead of doing general reduction analysis on unrolled loops and its contribution to total C2 execution time (https://github.com/robcasloz/jdk/blob/2171ced27b6a99651b194ab51919c68ade5afb23/src/hotspot/share/opto/superword.cpp#L115).

- Evaluate the overhead of doing per-node general reduction analysis for x86-64 min/max floating-point intrinsic selection (https://github.com/robcasloz/jdk/blob/2171ced27b6a99651b194ab51919c68ade5afb23/src/hotspot/share/opto/node.cpp#L3236).

- Test that the same reduction loops are vectorized as before. As part of this effort, IR test framework checks could be added to the reduction vectorization test cases for more robust regression testing. Some examples can be found in the prototype, e.g. https://github.com/robcasloz/jdk/blob/general-reduction-analysis/test/hotspot/jtreg/compiler/loopopts/superword/RedTest_int_x64.java. This would require additional support IR test framework for multi-target IR matching, see ~~JDK-8280120~~.

- Test that the x86-64 min/max floating-point intrinsic selection works at least as well as (and possibly better than) the original implementation. The potential improvement over the original implementation comes from the fact that on-demand reduction analysis would be performed on *every* min/max floating-point node visited during instruction selection, not just on those visited in the context of SLP analysis.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

reduction-before-unrolling.pdf
2 kB
2022-05-20 06:04
reduction-unrolled-x4.pdf
3 kB
2022-05-20 06:04

is blocked by

JDK-8294715 Add IR checks to the reduction vectorization tests

Resolved

relates to

JDK-8279622 C2: miscompilation of map pattern as a vector reduction

Resolved

JDK-8286177 C2: "failed: non-reduction loop contains reduction nodes" assert failure

Resolved

JDK-8308746 C2 IR test failures for TestFpMinMaxReductions.java with SSE2

Resolved

JDK-8261147 C2: Node is wrongly marked as reduction resulting in a wrong execution due to wrong vector instructions

Closed

JDK-8305707 SuperWord should vectorize reverse-order reduction loops

Open

JDK-8306989 C2: generalize search in SLP reduction analysis

Open

JDK-8074981 Integer/FP scalar reduction optimization

Resolved

JDK-8217561 x86: Add floating-point Math.min/max intrinsics

Resolved

JDK-8290964 C2 compilation fails with assert "non-reduction loop contains reduction nodes"

Resolved

JDK-8317507 C2 compilation fails with "Exceeded _node_regs array"

Closed

links to

Commit openjdk/jdk/1be80a44

Review openjdk/jdk/13120

(6 relates to, 2 links to)

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates