Loading...

Type: Enhancement
Resolution: Unresolved
Priority: P4
Fix Version/s: tbd
Affects Version/s: 26
Component/s: hotspot
Labels:

Subcomponent:
compiler

The flag LoopMaxUnroll currently behaves a bit unexpected, and does not allow explicit control over unrolling before vectorization and after vectorization.

Such control would be quite helpful to debug performance issues, such as encountered during ~~JDK-8367158~~, where we want to compare different unrolling factors of vectorized code. For example we may want to compare auto-vectorized code of different super-unrolling factors with fill/copy intrinsics.

--------------------- How LoopMaxUnroll currently works ----------------

The description of the flag is a bit weak:

  200 product(intx, LoopMaxUnroll, 16, \
  201 "Maximum number of unrolls for main loop") \
  202 range(0, max_jint) \

It seems to suggest it might cover the factor of "pre-vec-unroll * super-unroll", i.e. total unroll. But that does not seem to be the case.

In IdealLoopTree::policy_unroll

   977 _local_loop_unroll_limit = LoopUnrollLimit;
   978 _local_loop_unroll_factor = 4;
   979 int future_unroll_cnt = cl->unrolled_count() * 2;
   980 if (!cl->is_vectorized_loop()) {
   981 if (future_unroll_cnt > LoopMaxUnroll) return false;
   982 } else {
   983 // obey user constraints on vector mapped loops with additional unrolling applied
   984 int unroll_constraint = (cl->slp_max_unroll()) ? cl->slp_max_unroll() : 1;
   985 if ((future_unroll_cnt / unroll_constraint) > LoopMaxUnroll) return false;
   986 }

So when we are checking if we can unroll, there are these cases:
- scalar loop -> do not unroll more than LoopMaxUnroll
- vector main loop -> do not unroll more than LoopMaxUnroll*slp_max_unroll
- vector drain loop -> do not unroll more than LoopMaxUnroll. Q: can that lead to super-unrolling? Probably not...?

It seems we also always set _local_loop_unroll_factor = 4, which mal be relevant below.

  1116 if (phase->C->do_superword()) {
  1117 // Only attempt slp analysis when user controls do not prohibit it
  1118 if (!range_checks_present() && (LoopMaxUnroll > _local_loop_unroll_factor)) {
  1119 // Once policy_slp_analysis succeeds, mark the loop with the
  1120 // maximal unroll factor so that we minimize analysis passes
  1121 if (future_unroll_cnt >= _local_loop_unroll_factor) {
  1122 policy_unroll_slp_analysis(cl, phase, future_unroll_cnt);
  1123 }
  1124 }
  1125 }

So LoopMaxUnroll must be 8 or larger, otherwise we don't do the policy_unroll_slp_analysis. Curious!
-> Investigate!

  1127 int slp_max_unroll_factor = cl->slp_max_unroll();
  1128 if ((LoopMaxUnroll < slp_max_unroll_factor) && FLAG_IS_DEFAULT(LoopMaxUnroll) && UseSubwordForMaxVector) {
  1129 LoopMaxUnroll = slp_max_unroll_factor;
  1130 }

We may now update the flag, if still in default mode. But this is a global update... so next time we come through here we cannot update it any more, right? Does this look good at all?
-> Investigation: ok, it does get updated repeatedly. I played around with an example like TestLoopMaxUnrollIncreasing.java

We also use the flag to limit the search for reduction chains in SuperWord:
src/hotspot/share/opto/superword.cpp: PathEnd path_to_phi = find_in_path(n, input, LoopMaxUnroll, has_my_opcode,
src/hotspot/share/opto/superword.cpp: PathEnd path_from_phi = find_in_path(first, input, LoopMaxUnroll, has_my_opcode,

policy_unroll_slp_analysis
- unrolling_analysis sets _local_loop_unroll_factor, mark_passed_slp, and set_slp_max_unroll.
- Maybe we can refactor the code, so we don't set random states all over the place?
- eventually, we then set the _local_loop_unroll_limit, which seems to be the real limit... ah but that is a node limit??? a bit strange... and could possibly lead to inaccurate super-unrolling. The heuristic is also odd.
   -> this probably leads to the horrible over-unrolling we have seen when we pass slp analysis but fail to vectorize!

TODO: policy_unroll_slp_analysis - meaning of related fields
TODO: consider deprecating UseSubwordForMaxVector - I see no reason to disable it ever. We also have no tests for its correctness if disabled.
TODO: consider deprecating SuperWordLoopUnrollAnalysis - ah but it is false on arm only... strange!

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

TestLoopMaxUnrollIncreasing.java
0.7 kB
2025-09-19 00:03

relates to

JDK-8129920 Vectorized loop unrolling

Resolved

JDK-8187601 Unrolling more when SLP auto-vectorization failed

Resolved

JDK-8367158 C2: create better fill and copy benchmarks, taking alignment into account

Resolved

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates