Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Unresolved
Priority: P4
Fix Version/s: tbd
Affects Version/s: None
Component/s: hotspot
Labels:

Subcomponent:
compiler
CPU:

generic
OS:

generic

This is inspired by the discussion in: https://github.com/openjdk/jdk/pull/27526#discussion_r2681318247

FP16 reductions only auto‑vectorize if the code is written in terms of raw short bit patterns and explicit shortBitsToFloat16 / float16ToRawShortBits conversions inside the loop. That’s a fairly convoluted style; a typical user would reasonably write their code with Float16 values end‑to‑end and expect the compiler to handle the boxing/unboxing, but such loops are not currently vectorized.

The following case can be vectorized:

short acc = float16ToRawShortBits(Float16.POSITIVE_INFINITY);
for (int i = 0; i < LEN; ++i) {
    acc = float16ToRawShortBits(
              min(shortBitsToFloat16(input[i]),
                  shortBitsToFloat16(acc)));
}

In contrast, the following case CANNOT be vectorized:

Float16 acc = Float16.POSITIVE_INFINITY;
for (int i = 0; i < LEN; ++i) {
    acc = min(inputFP16[i], acc);
}

Assignee:: Unassigned
Reporter:: Fei Gao
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: 2026-01-14 07:43
Updated:: 2026-02-13 12:52

Details

Description

Attachments

Activity

People

Dates