Loading...

XML

Word

Printable

Type: CSR
Resolution: Approved
Priority: P4
Fix Version/s: 8u43
Component/s: core-libs
Labels:
- jsr337-mr5

Subcomponent:
java.lang
Compatibility Kind:

behavioral
Compatibility Risk:
minimal
Compatibility Risk Description:

Hide
The risk is minimal as this CSR simply *allows* those code points, keeping the existing code points intact. Also the change prohibits the new code points being the start/part of the Java identifiers so that the binary compatibility will be kept, as we did with the Japanese Era/New currency symbol characters addition.

Show
The risk is minimal as this CSR simply *allows* those code points, keeping the existing code points intact. Also the change prohibits the new code points being the start/part of the Java identifiers so that the binary compatibility will be kept, as we did with the Japanese Era/New currency symbol characters addition.
Interface Kind:

Java API
Scope:
SE

Summary

Support "Implementation Level 2" of the GB18030-2022 standard by including the CJK Unified Ideographs Extension E Unicode block

Problem

In order for middleware applications to support "Implementation Level 2" of the GB18030-2022 standard, Java SE has to support the level.

Solution

"Implementation Level 2" requires to support characters on 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo; aka TGH 2013), which lists 8,105 ideographs. Of those characters (code points), additional 108 characters from the CJK Unified Ideographs Extension E block of Unicode 8.0 need to be allowed in Java SE 8 (with the fix to JDK-8301400). Although the required characters are 108, it would be beneficial to include the entire Extension E block from the implementation and future-proof (could be required more from this block) point of view.

Specification

Apply the following diff to the class description of java.lang.Character class:

@@ -55,9 +55,10 @@
  * implementation of class {@code Character} to use the Currency Symbols
  * block from version 10.0 of the Unicode Standard. Second, the Java SE 8 Platform
  * allows an implementation of class {@code Character} to use the code points
  * in the range of {@code U+9FCD} to {@code U+9FEF} from version 11.0 of the
- * Unicode Standard, in order for the class to allow the "Implementation
- * Level 1" of the Chinese GB18030-2022 standard. Third, the Java SE 8 Platform
+ * Unicode Standard and in the {@code CJK Unified Ideographs Extension E} block
+ * from version 8.0 of the Unicode Standard, in order for the class to allow the
+ * "Implementation Level 2" of the Chinese GB18030-2022 standard.
+ * Third, the Java SE 8 Platform
  * allows an implementation of class {@code Character} to use the Japanese Era
  * code point, {@code U+32FF}, from the Unicode Standard version 12.1.
  * Consequently, the

Add the following new field to java.lang.Character.UnicodeBlock class:

@@ -2573,6 +2574,17 @@ 
                              "ARABIC MATHEMATICAL ALPHABETIC SYMBOLS",
                              "ARABICMATHEMATICALALPHABETICSYMBOLS");

+        /**
+         * Constant for the "CJK Unified Ideographs Extension E" Unicode
+         * character block.
+         * @apiNote This field is defined in Java SE 8 Maintenance Release 5.
+         * @since 1.8
+         */
+        public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_E

csr of

JDK-8305681 Allow additional characters for GB18030-2022 (Level 2) support

Resolved

Assignee:: Naoto Sato

Reporter:: Naoto Sato

Reviewed By:: Iris Clark, Lance Andersen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023-04-05 15:47

Updated:: 2023-04-07 09:34

Resolved:: 2023-04-07 09:34

Details

Description

Summary

Problem

Solution

Specification

Attachments

Issue Links

Activity

People

Dates