-
Type:
Enhancement
-
Resolution: Unresolved
-
Priority:
P4
-
Affects Version/s: 27
-
Component/s: hotspot
-
generic
-
generic
This is inspired by the discussion in: https://github.com/openjdk/jdk/pull/27526#discussion_r2681318247
FP16 reductions only auto‑vectorize if the code is written in terms of raw short bit patterns and explicit shortBitsToFloat16 / float16ToRawShortBits conversions inside the loop. That’s a fairly convoluted style; a typical user would reasonably write their code with Float16 values end‑to‑end and expect the compiler to handle the boxing/unboxing, but such loops are not currently vectorized.
The following case can be vectorized:
short acc = float16ToRawShortBits(Float16.POSITIVE_INFINITY);
for (int i = 0; i < LEN; ++i) {
acc = float16ToRawShortBits(
min(shortBitsToFloat16(input[i]),
shortBitsToFloat16(acc)));
}
In contrast, the following case CANNOT be vectorized:
Float16 acc = Float16.POSITIVE_INFINITY;
for (int i = 0; i < LEN; ++i) {
acc = min(inputFP16[i], acc);
}
FP16 reductions only auto‑vectorize if the code is written in terms of raw short bit patterns and explicit shortBitsToFloat16 / float16ToRawShortBits conversions inside the loop. That’s a fairly convoluted style; a typical user would reasonably write their code with Float16 values end‑to‑end and expect the compiler to handle the boxing/unboxing, but such loops are not currently vectorized.
The following case can be vectorized:
short acc = float16ToRawShortBits(Float16.POSITIVE_INFINITY);
for (int i = 0; i < LEN; ++i) {
acc = float16ToRawShortBits(
min(shortBitsToFloat16(input[i]),
shortBitsToFloat16(acc)));
}
In contrast, the following case CANNOT be vectorized:
Float16 acc = Float16.POSITIVE_INFINITY;
for (int i = 0; i < LEN; ++i) {
acc = min(inputFP16[i], acc);
}