# Conversions between bit representations of half precision values and floats

XMLWordPrintable

#### Details

• CSR
• Status: Closed
• P4
• Resolution: Approved
• None
• minimal
• Add new static methods to an existing final class.
• Java API
• SE

## Summary

Add methods to convert between the binary16 format of IEEE 754 (stored as a `short`) and `float`.

## Problem

The 16-bit binary16 floating-point format is used in some computing contexts and is not natively supported in the Java platform. These two conversion methods provide a minimal level of support and would enable intrinsification to hardware instructions where available.

## Solution

Add two methods to `java.lang.Float` to support conversion in both directions between `float` and binary16.

## Specification

``````+    /**
+     * {@return the {@code float} value closest to the numerical value
+     * of the argument, a floating-point binary16 value encoded in a
+     * {@code short}} The conversion is exact; all binary16 values can
+     * be exactly represented in {@code float}.
+     *
+     * Special cases:
+     * <ul>
+     * <li> If the argument is zero, the result is a zero with the
+     * same sign as the argument.
+     * <li> If the argument is infinite, the result is an infinity
+     * with the same sign as the argument.
+     * <li> If the argument is a NaN, the result is a NaN.
+     * </ul>
+     *
+     * <h4><a id=binary16Format>IEEE 754 binary16 format</a></h4>
+     * The IEEE 754 standard defines binary16 as a 16-bit format, along
+     * with the 32-bit binary32 format (corresponding to the {@code
+     * float} type) and the 64-bit binary64 format (corresponding to
+     * the {@code double} type). The binary16 format is similar to the
+     * other IEEE 754 formats, except smaller, having all the usual
+     * IEEE 754 values such as NaN, signed infinities, signed zeros,
+     * and subnormals. The parameters (JLS {@jls 4.2.3}) for the
+     * binary16 format are N = 11 precision bits, K = 5 exponent bits,
+     * <i>E</i><sub><i>max</i></sub> = 15, and
+     * <i>E</i><sub><i>min</i></sub> = -14.
+     *
+     * @apiNote
+     * This method corresponds to the convertFormat operation defined
+     * in IEEE 754 from the binary16 format to the binary32 format.
+     * The operation of this method is analogous to a primitive
+     * widening conversion (JLS {@jls 5.1.2}).
+     *
+     * @param floatBinary16 the binary16 value to convert to {@code float}
+     * @since 20
+     */
+    public static float float16ToFloat(short floatBinary16)
+    ....
+
+    /**
+     * {@return the floating-point binary16 value, encoded in a {@code
+     * short}, closest in value to the argument}
+     * The conversion is computed under the {@linkplain
+     * java.math.RoundingMode#HALF_EVEN round to nearest even rounding
+     * mode}.
+     *
+     * Special cases:
+     * <ul>
+     * <li> If the argument is zero, the result is a zero with the
+     * same sign as the argument.
+     * <li> If the argument is infinite, the result is an infinity
+     * with the same sign as the argument.
+     * <li> If the argument is a NaN, the result is a NaN.
+     * </ul>
+     *
+     * The <a href="#binary16Format">binary16 format</a> is discussed in
+     * more detail in the {@link #float16ToFloat} method.
+     *
+     * @apiNote
+     * This method corresponds to the convertFormat operation defined
+     * in IEEE 754 from the binary32 format to the binary16 format.
+     * The operation of this method is analogous to a primitive
+     * narrowing conversion (JLS {@jls 5.1.3}).
+     *
+     * @param f the {@code float} value to convert to binary16
+     * @since 20
+     */
+    public static short floatToFloat16(float f)``````

#### People

Joe Darcy
Paul Sandoz
Paul Sandoz