Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8368061

C2 SuperWord: allow more control over loop unrolling and super-unrolling

XMLWordPrintable

      The flag LoopMaxUnroll currently behaves a bit unexpected, and does not allow explicit control over unrolling before vectorization and after vectorization.

      Such control would be quite helpful to debug performance issues, such as encountered during JDK-8367158, where we want to compare different unrolling factors of vectorized code. For example we may want to compare auto-vectorized code of different super-unrolling factors with fill/copy intrinsics.

      --------------------- How LoopMaxUnroll currently works ----------------

      The description of the flag is a bit weak:

        200 product(intx, LoopMaxUnroll, 16, \
        201 "Maximum number of unrolls for main loop") \
        202 range(0, max_jint) \

      It seems to suggest it might cover the factor of "pre-vec-unroll * super-unroll", i.e. total unroll. But that does not seem to be the case.

      In IdealLoopTree::policy_unroll

         977 _local_loop_unroll_limit = LoopUnrollLimit;
         978 _local_loop_unroll_factor = 4;
         979 int future_unroll_cnt = cl->unrolled_count() * 2;
         980 if (!cl->is_vectorized_loop()) {
         981 if (future_unroll_cnt > LoopMaxUnroll) return false;
         982 } else {
         983 // obey user constraints on vector mapped loops with additional unrolling applied
         984 int unroll_constraint = (cl->slp_max_unroll()) ? cl->slp_max_unroll() : 1;
         985 if ((future_unroll_cnt / unroll_constraint) > LoopMaxUnroll) return false;
         986 }

      So when we are checking if we can unroll, there are these cases:
      - scalar loop -> do not unroll more than LoopMaxUnroll
      - vector main loop -> do not unroll more than LoopMaxUnroll*slp_max_unroll
      - vector drain loop -> do not unroll more than LoopMaxUnroll. Q: can that lead to super-unrolling? Probably not...?

      It seems we also always set _local_loop_unroll_factor = 4, which mal be relevant below.

        1116 if (phase->C->do_superword()) {
        1117 // Only attempt slp analysis when user controls do not prohibit it
        1118 if (!range_checks_present() && (LoopMaxUnroll > _local_loop_unroll_factor)) {
        1119 // Once policy_slp_analysis succeeds, mark the loop with the
        1120 // maximal unroll factor so that we minimize analysis passes
        1121 if (future_unroll_cnt >= _local_loop_unroll_factor) {
        1122 policy_unroll_slp_analysis(cl, phase, future_unroll_cnt);
        1123 }
        1124 }
        1125 }

      So LoopMaxUnroll must be 8 or larger, otherwise we don't do the policy_unroll_slp_analysis. Curious!
      -> Investigate!

        1127 int slp_max_unroll_factor = cl->slp_max_unroll();
        1128 if ((LoopMaxUnroll < slp_max_unroll_factor) && FLAG_IS_DEFAULT(LoopMaxUnroll) && UseSubwordForMaxVector) {
        1129 LoopMaxUnroll = slp_max_unroll_factor;
        1130 }

      We may now update the flag, if still in default mode. But this is a global update... so next time we come through here we cannot update it any more, right? Does this look good at all?
      -> Investigation: ok, it does get updated repeatedly. I played around with an example like TestLoopMaxUnrollIncreasing.java

      We also use the flag to limit the search for reduction chains in SuperWord:
      src/hotspot/share/opto/superword.cpp: PathEnd path_to_phi = find_in_path(n, input, LoopMaxUnroll, has_my_opcode,
      src/hotspot/share/opto/superword.cpp: PathEnd path_from_phi = find_in_path(first, input, LoopMaxUnroll, has_my_opcode,

      policy_unroll_slp_analysis
       - unrolling_analysis sets _local_loop_unroll_factor, mark_passed_slp, and set_slp_max_unroll.
       - Maybe we can refactor the code, so we don't set random states all over the place?
       - eventually, we then set the _local_loop_unroll_limit, which seems to be the real limit... ah but that is a node limit??? a bit strange... and could possibly lead to inaccurate super-unrolling. The heuristic is also odd.
         -> this probably leads to the horrible over-unrolling we have seen when we pass slp analysis but fail to vectorize!

      TODO: policy_unroll_slp_analysis - meaning of related fields
      TODO: consider deprecating UseSubwordForMaxVector - I see no reason to disable it ever. We also have no tests for its correctness if disabled.
      TODO: consider deprecating SuperWordLoopUnrollAnalysis - ah but it is false on arm only... strange!

            epeter Emanuel Peter
            epeter Emanuel Peter
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: