-
CSR
-
Resolution: Approved
-
P4
-
None
-
low
-
-
Java API
-
SE
Summary
Add UTF-32 based Charset
constants into java.nio.charset.StandardCharsets
class
Problem
There are Charset
constants for UTF encoding schemes defined by ISO10646/Unicode in StandardCharsets
. However, UTF-32
based charsets are not defined, while others (UTF-8, UTF-16 based) are defined. It would be consistent to add UTF-32 based charsets as well.
Solution
Add UTF-32BE
, UTF-32LE
, and UTF-32
public fields in StandardCharsets
class. Also, remove "Amendment 1/2" from the existing UTF-8/16 explanations in the java.nio.charset.Charset
class description, as there seems the description for UTF-8/16 are now in the main document (https://www.iso.org/obp/ui/en/#iso:std:iso-iec:10646:ed-6:v1:en).
Specification
Add the following public fields in java.nio.charset.StandardCharset
class:
/**
* Thirty-two-bit UCS Transformation Format, big-endian byte order.
* @since 22
*/
public static final Charset UTF_32BE;
/**
* Thirty-two-bit UCS Transformation Format, little-endian byte order.
* @since 22
*/
public static final Charset UTF_32LE;
/**
* Thirty-two-bit UCS Transformation Format, byte order identified by an
* optional byte-order mark.
* @since 22
*/
public static final Charset UTF_32;
Modify the Standard charsets
section of the class description for java.nio.charset.Charset
class as follows:
* <td>Sixteen-bit UCS Transformation Format,
* little-endian byte order</td></tr>
* <tr><th scope="row" style="vertical-align:top">{@code UTF-16}</th>
* <td>Sixteen-bit UCS Transformation Format,
* byte order identified by an optional byte-order mark</td></tr>
+ * <tr><th scope="row" style="vertical-align:top">{@code UTF-32BE}</th>
+ * <td>Thirty-two-bit UCS Transformation Format,
+ * big-endian byte order</td></tr>
+ * <tr><th scope="row" style="vertical-align:top">{@code UTF-32LE}</th>
+ * <td>Thirty-two-bit UCS Transformation Format,
+ * little-endian byte order</td></tr>
+ * <tr><th scope="row" style="vertical-align:top">{@code UTF-32}</th>
+ * <td>Thirty-two-bit UCS Transformation Format,
+ * byte order identified by an optional byte-order mark</td></tr>
* </tbody>
* </table></blockquote>
*
* <p> The {@code UTF-8} charset is specified by <a
* href="http://www.ietf.org/rfc/rfc2279.txt"><i>RFC 2279</i></a>; the
* transformation format upon which it is based is specified in
- * Amendment 2 of ISO 10646-1 and is also described in the <a
+ * ISO 10646-1 and is also described in the <a
* href="http://www.unicode.org/standard/standard.html"><i>Unicode
* Standard</i></a>.
*
* <p> The {@code UTF-16} charsets are specified by <a
* href="http://www.ietf.org/rfc/rfc2781.txt"><i>RFC 2781</i></a>; the
* transformation formats upon which they are based are specified in
- * Amendment 1 of ISO 10646-1 and are also described in the <a
+ * ISO 10646-1 and are also described in the <a
+ * href="http://www.unicode.org/standard/standard.html"><i>Unicode
+ * Standard</i></a>.
+ *
+ * <p> The {@code UTF-32} charsets are based upon transformation formats
+ * which are specified in
+ * ISO 10646-1 and are also described in the <a
* href="http://www.unicode.org/standard/standard.html"><i>Unicode
* Standard</i></a>.
*
- * <p> The {@code UTF-16} charsets use sixteen-bit quantities and are
+ * <p> The {@code UTF-16} and {@code UTF-32} charsets use sixteen-bit and thirty-two-bit
+ * quantities respectively, and are
* therefore sensitive to byte order. In these encodings the byte order of a
* stream may be indicated by an initial <i>byte-order mark</i> represented by
- * the Unicode character <code>'\uFEFF'</code>. Byte-order marks are handled
+ * the Unicode character {@code U+FEFF}. Byte-order marks are handled
* as follows:
*
* <ul>
*
- * <li><p> When decoding, the {@code UTF-16BE} and {@code UTF-16LE}
+ * <li><p> When decoding, the {@code UTF-16BE}, {@code UTF-16LE},
+ * {@code UTF-32BE}, and {@code UTF-32LE}
* charsets interpret the initial byte-order marks as a <small>ZERO-WIDTH
* NON-BREAKING SPACE</small>; when encoding, they do not write
* byte-order marks. </p></li>
*
- * <li><p> When decoding, the {@code UTF-16} charset interprets the
+ * <li><p> When decoding, the {@code UTF-16} and {@code UTF-32} charsets interpret the
* byte-order mark at the beginning of the input stream to indicate the
* byte-order of the stream but defaults to big-endian if there is no
* byte-order mark; when encoding, it uses big-endian byte order and writes
* a big-endian byte-order mark. </p></li>
- csr of
-
JDK-8310047 Add UTF-32 based Charsets into StandardCharsets
-
- Resolved
-