Improve auto‑vectorization of Float16 with Float16 object

XMLWordPrintable

    • Type: Enhancement
    • Resolution: Unresolved
    • Priority: P4
    • tbd
    • Affects Version/s: 27
    • Component/s: hotspot
    • generic
    • generic

      This is inspired by the discussion in: https://github.com/openjdk/jdk/pull/27526#discussion_r2681318247

      FP16 reductions only auto‑vectorize if the code is written in terms of raw short bit patterns and explicit shortBitsToFloat16 / float16ToRawShortBits conversions inside the loop. That’s a fairly convoluted style; a typical user would reasonably write their code with Float16 values end‑to‑end and expect the compiler to handle the boxing/unboxing, but such loops are not currently vectorized.

      The following case can be vectorized:

      short acc = float16ToRawShortBits(Float16.POSITIVE_INFINITY);
      for (int i = 0; i < LEN; ++i) {
          acc = float16ToRawShortBits(
                    min(shortBitsToFloat16(input[i]),
                        shortBitsToFloat16(acc)));
      }

      In contrast, the following case CANNOT be vectorized:

      Float16 acc = Float16.POSITIVE_INFINITY;
      for (int i = 0; i < LEN; ++i) {
          acc = min(inputFP16[i], acc);
      }

            Assignee:
            Unassigned
            Reporter:
            Fei Gao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: