Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8343597

C2 SuperWord: RelaxedMath for faster float reductions

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • 24
    • hotspot

      For a while, I have been bothered by the fact that float-reductions cannot be vectorized. This is because they require strict order of reduction, which prevents parallelization - they must be added/multiplied sequentially - otherwise there can be different rounding errors.

      With [~jrose] and [~darcy] we have been discussing how to best allow faster reductions for floats/doubles. I just want to allow faster reductions, for example for a fast sum or dot-product. My approach is from a HPC/ML background where the exact precision of floats is not super important. Joe's background here leaned more on the side of reproducability: by default a sum should always return the exact same value, and not different values depending on if the compiler decided to optimize of not. Hence we agreed on this plan for now:

      The 3 levels of work:
      - Internal class "RelaxedMath" , with static methods. Optimizations on VM level that exploint their relaxed semantics. The semantics goes across all "similar" ops, so that we can reorder sums/reductions. Maybe we also experiment with a version that allows combining add and mul into fma.
      - Public API extensions to Collector and maybe Array. These are easier to write a clean spec for (sum with arbitrary reordering of inputs).
      - Application in project Babylon: allow expression transformation of regular float-ops to relaxed float-ops for speedup at the price of reproducability of rounding errors.

      My goal now is the first step:
      - Introduce the class "RelaxedMath"
      - Define methods like "RelaxedMath.add(float, float)"
      - Intrinsify these methods: i.e. capture them as special IR nodes (e.g. RelaxedAddF).
      - Optmize based on relaxed semantics (first just for SuperWord/AutoVectorization non-strict reductions)
      - Lower any scalar relaxed ops to strict ops -> use the same backend operations.

            epeter Emanuel Peter
            epeter Emanuel Peter
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: