Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8310251

Add UTF-32 based Charsets into StandardCharsets

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 22
    • core-libs
    • None
    • low
    • Hide
      This enhancement does not modify the existing behavior but is simply an addition. Although adding the constants in StandardCharsets require every Java SE implementations need to support UTF-32 encodings, it shouldn't cause any issue as they are implemented in the OpenJDK
      Show
      This enhancement does not modify the existing behavior but is simply an addition. Although adding the constants in StandardCharsets require every Java SE implementations need to support UTF-32 encodings, it shouldn't cause any issue as they are implemented in the OpenJDK
    • Java API
    • SE

      Summary

      Add UTF-32 based Charset constants into java.nio.charset.StandardCharsets class

      Problem

      There are Charset constants for UTF encoding schemes defined by ISO10646/Unicode in StandardCharsets. However, UTF-32 based charsets are not defined, while others (UTF-8, UTF-16 based) are defined. It would be consistent to add UTF-32 based charsets as well.

      Solution

      Add UTF-32BE, UTF-32LE, and UTF-32 public fields in StandardCharsets class. Also, remove "Amendment 1/2" from the existing UTF-8/16 explanations in the java.nio.charset.Charset class description, as there seems the description for UTF-8/16 are now in the main document (https://www.iso.org/obp/ui/en/#iso:std:iso-iec:10646:ed-6:v1:en).

      Specification

      Add the following public fields in java.nio.charset.StandardCharset class:

      /**
       * Thirty-two-bit UCS Transformation Format, big-endian byte order.
       * @since 22
       */
      public static final Charset UTF_32BE;
      
      /**
       * Thirty-two-bit UCS Transformation Format, little-endian byte order.
       * @since 22
       */
      public static final Charset UTF_32LE;
      
      /**
       * Thirty-two-bit UCS Transformation Format, byte order identified by an
       * optional byte-order mark.
       * @since 22
       */
      public static final Charset UTF_32;

      Modify the Standard charsets section of the class description for java.nio.charset.Charset class as follows:

         *     <td>Sixteen-bit UCS Transformation Format,
         *         little-endian byte&nbsp;order</td></tr>
         * <tr><th scope="row" style="vertical-align:top">{@code UTF-16}</th>
         *     <td>Sixteen-bit UCS Transformation Format,
         *         byte&nbsp;order identified by an optional byte-order mark</td></tr>
      +  * <tr><th scope="row" style="vertical-align:top">{@code UTF-32BE}</th>
      +  *     <td>Thirty-two-bit UCS Transformation Format,
      +  *         big-endian byte&nbsp;order</td></tr>
      +  * <tr><th scope="row" style="vertical-align:top">{@code UTF-32LE}</th>
      +  *     <td>Thirty-two-bit UCS Transformation Format,
      +  *         little-endian byte&nbsp;order</td></tr>
      +  * <tr><th scope="row" style="vertical-align:top">{@code UTF-32}</th>
      +  *     <td>Thirty-two-bit UCS Transformation Format,
      +  *         byte&nbsp;order identified by an optional byte-order mark</td></tr>
         * </tbody>
         * </table></blockquote>
         *
         * <p> The {@code UTF-8} charset is specified by <a
         * href="http://www.ietf.org/rfc/rfc2279.txt"><i>RFC&nbsp;2279</i></a>; the
         * transformation format upon which it is based is specified in
      -  * Amendment&nbsp;2 of ISO&nbsp;10646-1 and is also described in the <a
      +  * ISO&nbsp;10646-1 and is also described in the <a
         * href="http://www.unicode.org/standard/standard.html"><i>Unicode
         * Standard</i></a>.
         *
         * <p> The {@code UTF-16} charsets are specified by <a
         * href="http://www.ietf.org/rfc/rfc2781.txt"><i>RFC&nbsp;2781</i></a>; the
         * transformation formats upon which they are based are specified in
      -  * Amendment&nbsp;1 of ISO&nbsp;10646-1 and are also described in the <a
      +  * ISO&nbsp;10646-1 and are also described in the <a
      +  * href="http://www.unicode.org/standard/standard.html"><i>Unicode
      +  * Standard</i></a>.
      +  *
      +  * <p> The {@code UTF-32} charsets are based upon transformation formats
      +  * which are specified in
      +  * ISO&nbsp;10646-1 and are also described in the <a
         * href="http://www.unicode.org/standard/standard.html"><i>Unicode
         * Standard</i></a>.
         *
      -  * <p> The {@code UTF-16} charsets use sixteen-bit quantities and are
      +  * <p> The {@code UTF-16} and {@code UTF-32} charsets use sixteen-bit and thirty-two-bit
      +  * quantities respectively, and are
         * therefore sensitive to byte order.  In these encodings the byte order of a
         * stream may be indicated by an initial <i>byte-order mark</i> represented by
      -  * the Unicode character <code>'&#92;uFEFF'</code>.  Byte-order marks are handled
      +  * the Unicode character {@code U+FEFF}.  Byte-order marks are handled
         * as follows:
         *
         * <ul>
         *
      -  *   <li><p> When decoding, the {@code UTF-16BE} and {@code UTF-16LE}
      +  *   <li><p> When decoding, the {@code UTF-16BE}, {@code UTF-16LE},
      +  *   {@code UTF-32BE}, and {@code UTF-32LE}
         *   charsets interpret the initial byte-order marks as a <small>ZERO-WIDTH
         *   NON-BREAKING SPACE</small>; when encoding, they do not write
         *   byte-order marks. </p></li>
         *
      -  *   <li><p> When decoding, the {@code UTF-16} charset interprets the
      +  *   <li><p> When decoding, the {@code UTF-16} and {@code UTF-32} charsets interpret the
         *   byte-order mark at the beginning of the input stream to indicate the
         *   byte-order of the stream but defaults to big-endian if there is no
         *   byte-order mark; when encoding, it uses big-endian byte order and writes
         *   a big-endian byte-order mark. </p></li>

            naoto Naoto Sato
            naoto Naoto Sato
            Alan Bateman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: