Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8305686

Allow additional characters for GB18030-2022 (Level 2) support

    XMLWordPrintable

Details

    • behavioral
    • minimal
    • Hide
      The risk is minimal as this CSR simply *allows* those code points, keeping the existing code points intact. Also the change prohibits the new code points being the start/part of the Java identifiers so that the binary compatibility will be kept, as we did with the Japanese Era/New currency symbol characters addition.
      Show
      The risk is minimal as this CSR simply *allows* those code points, keeping the existing code points intact. Also the change prohibits the new code points being the start/part of the Java identifiers so that the binary compatibility will be kept, as we did with the Japanese Era/New currency symbol characters addition.
    • Java API
    • SE

    Description

      Summary

      Support "Implementation Level 2" of the GB18030-2022 standard by including the CJK Unified Ideographs Extension E Unicode block

      Problem

      In order for middleware applications to support "Implementation Level 2" of the GB18030-2022 standard, Java SE has to support the level.

      Solution

      "Implementation Level 2" requires to support characters on 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo; aka TGH 2013), which lists 8,105 ideographs. Of those characters (code points), additional 108 characters from the CJK Unified Ideographs Extension E block of Unicode 8.0 need to be allowed in Java SE 8 (with the fix to JDK-8301400). Although the required characters are 108, it would be beneficial to include the entire Extension E block from the implementation and future-proof (could be required more from this block) point of view.

      Specification

      Apply the following diff to the class description of java.lang.Character class:

      @@ -55,9 +55,10 @@
        * implementation of class {@code Character} to use the Currency Symbols
        * block from version 10.0 of the Unicode Standard. Second, the Java SE 8 Platform
        * allows an implementation of class {@code Character} to use the code points
        * in the range of {@code U+9FCD} to {@code U+9FEF} from version 11.0 of the
      - * Unicode Standard, in order for the class to allow the "Implementation
      - * Level 1" of the Chinese GB18030-2022 standard. Third, the Java SE 8 Platform
      + * Unicode Standard and in the {@code CJK Unified Ideographs Extension E} block
      + * from version 8.0 of the Unicode Standard, in order for the class to allow the
      + * "Implementation Level 2" of the Chinese GB18030-2022 standard.
      + * Third, the Java SE 8 Platform
        * allows an implementation of class {@code Character} to use the Japanese Era
        * code point, {@code U+32FF}, from the Unicode Standard version 12.1.
        * Consequently, the

      Add the following new field to java.lang.Character.UnicodeBlock class:

      @@ -2573,6 +2574,17 @@ 
                                    "ARABIC MATHEMATICAL ALPHABETIC SYMBOLS",
                                    "ARABICMATHEMATICALALPHABETICSYMBOLS");
      
      +        /**
      +         * Constant for the "CJK Unified Ideographs Extension E" Unicode
      +         * character block.
      +         * @apiNote This field is defined in Java SE 8 Maintenance Release 5.
      +         * @since 1.8
      +         */
      +        public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_E

      Attachments

        Issue Links

          Activity

            People

              naoto Naoto Sato
              naoto Naoto Sato
              Iris Clark, Lance Andersen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: