-
CSR
-
Resolution: Approved
-
P4
-
behavioral
-
minimal
-
-
Java API
-
SE
Summary
Support "Implementation Level 2" of the GB18030-2022 standard by including the CJK Unified Ideographs Extension E
Unicode block
Problem
In order for middleware applications to support "Implementation Level 2" of the GB18030-2022 standard, Java SE has to support the level.
Solution
"Implementation Level 2" requires to support characters on 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo; aka TGH 2013), which lists 8,105 ideographs. Of those characters (code points), additional 108 characters from the CJK Unified Ideographs Extension E
block of Unicode 8.0 need to be allowed in Java SE 8 (with the fix to JDK-8301400).
Although the required characters are 108, it would be beneficial to include the entire Extension E block from the implementation and future-proof (could be required more from this block) point of view.
Specification
Apply the following diff to the class description of java.lang.Character
class:
@@ -55,9 +55,10 @@
* implementation of class {@code Character} to use the Currency Symbols
* block from version 10.0 of the Unicode Standard. Second, the Java SE 8 Platform
* allows an implementation of class {@code Character} to use the code points
* in the range of {@code U+9FCD} to {@code U+9FEF} from version 11.0 of the
- * Unicode Standard, in order for the class to allow the "Implementation
- * Level 1" of the Chinese GB18030-2022 standard. Third, the Java SE 8 Platform
+ * Unicode Standard and in the {@code CJK Unified Ideographs Extension E} block
+ * from version 8.0 of the Unicode Standard, in order for the class to allow the
+ * "Implementation Level 2" of the Chinese GB18030-2022 standard.
+ * Third, the Java SE 8 Platform
* allows an implementation of class {@code Character} to use the Japanese Era
* code point, {@code U+32FF}, from the Unicode Standard version 12.1.
* Consequently, the
Add the following new field to java.lang.Character.UnicodeBlock
class:
@@ -2573,6 +2574,17 @@
"ARABIC MATHEMATICAL ALPHABETIC SYMBOLS",
"ARABICMATHEMATICALALPHABETICSYMBOLS");
+ /**
+ * Constant for the "CJK Unified Ideographs Extension E" Unicode
+ * character block.
+ * @apiNote This field is defined in Java SE 8 Maintenance Release 5.
+ * @since 1.8
+ */
+ public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_E
- csr of
-
JDK-8305681 Allow additional characters for GB18030-2022 (Level 2) support
-
- Resolved
-