Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8301558

Allow additional characters for GB18030-2022 support

XMLWordPrintable

    • behavioral
    • minimal
    • Hide
      The risk is minimal as this CSR simply *allows* those code points, keeping the existing code points intact. Also the change prohibits the new code points being the start/part of the Java identifiers so that the binary compatibility will be kept, as we did with the Japanese Era character addition.
      Show
      The risk is minimal as this CSR simply *allows* those code points, keeping the existing code points intact. Also the change prohibits the new code points being the start/part of the Java identifiers so that the binary compatibility will be kept, as we did with the Japanese Era character addition.
    • Java API
    • SE

      Summary

      Allow additional code points to support GB18030-2022 from beyond Unicode 10 which Java SE 11 is based upon.

      Problem

      China National Standard body (CESI) has recently published GB18030-2022 which is an updated version of the GB18030 standard and brings GB18030 in sync with Unicode version 11.0. Since Java SE 11 supports characters defined in Unicode 10.0, some characters defined in the new GB18030 standard cannot be represented.

      Solution

      Allow code points that are required by the Implementation Level 1 definition in the GB18030-2022 standard. Additionally required code points are in the range of U+9FEB to U+9FEF, totaling 5 code points.

      Specification

      Modify the second paragraph in the Unicode Conformance section in the class description of java.lang.Character class as follows:

      diff a/src/java.base/share/classes/java/lang/Character.java b/src/java.base/share/classes/java/lang/Character.java
      --- a/src/java.base/share/classes/java/lang/Character.java
      +++ b/src/java.base/share/classes/java/lang/Character.java
      @@ -52,14 +52,18 @@
        * assigned Unicode code point or character range. The file is available
        * from the Unicode Consortium at
        * <a href="http://www.unicode.org">http://www.unicode.org</a>.
        * <p>
        * The Java SE 11 Platform uses character information from version 10.0
      - * of the Unicode Standard, with an extension. The Java SE 11 Platform allows
      - * an implementation of class {@code Character} to use the Japanese Era
      - * code point, {@code U+32FF}, from the first version of the Unicode Standard
      - * after 10.0 that assigns the code point. Consequently, the behavior of
      + * of the Unicode Standard, with two extensions. First, the Java SE 11 Platform
      + * allows an implementation of class {@code Character} to use the code points
      + * in the range of {@code U+9FEB} to {@code U+9FEF} from the Unicode Standard
      + * version 11.0, in order for the class to allow the "Implementation Level 1"
      + * of the Chinese GB18030-2022 standard. Second, the Java SE 11 Platform
      + * allows an implementation of class {@code Character} to use the Japanese Era
      + * code point, {@code U+32FF}, from the Unicode Standard version 12.1.
      + * Consequently, the behavior of
        * fields and methods of class {@code Character} may vary across
        * implementations of the Java SE 11 Platform when processing the
        * aforementioned code point ( outside of version 10.0 ), except for
        * the following methods that define Java identifiers:
        * {@link #isJavaIdentifierStart(int)}, {@link #isJavaIdentifierStart(char)},

            naoto Naoto Sato
            naoto Naoto Sato
            Alan Bateman, Lance Andersen
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: