Summary
Support the new GB18030 standard in the JDK
Problem
The Chinese government mandates upgrading the GB18030 mappings, but some of these are incompatible with the existing GB18030 mapping in the JDK which is based on the 2000 standard. This incompatibility could cause various character conversion failures, such as file content corruption. Thus a simple replacement of the GB18030
Charset should be avoided.
Solution
Replaces the charset implementation for GB18030
based on the new 2022 mapping. The new mapping will swap some code points previously assigned in the private use area to their equivalent code points that Unicode later assigned. Here is the list of all those swapped code points:
GB18030 byte sequence | 2000 mapping | 2022 mapping |
---|---|---|
A6D9 | U+E78D | U+FE10 |
A6DA | U+E78E | U+FE12 |
A6DB | U+E78F | U+FE11 |
A6DC | U+E790 | U+FE13 |
A6DD | U+E791 | U+FE14 |
A6DE | U+E792 | U+FE15 |
A6DF | U+E793 | U+FE16 |
A6EC | U+E794 | U+FE17 |
A6ED | U+E795 | U+FE18 |
A6F3 | U+E796 | U+FE19 |
A8BC | U+E7C7 | U+1E3F |
FE59 | U+E81E | U+9FB4 |
FE61 | U+E826 | U+9FB5 |
FE66 | U+E82B | U+9FB6 |
FE67 | U+E82C | U+9FB7 |
FE6D | U+E832 | U+9FB8 |
FE7E | U+E843 | U+9FB9 |
FE90 | U+E854 | U+9FBA |
FEA0 | U+E864 | U+9FBB |
8135F437 | U+1E3F | U+E7C7 |
82359037 | U+9FB4 | U+E81E |
82359038 | U+9FB5 | U+E826 |
82359039 | U+9FB6 | U+E82B |
82359130 | U+9FB7 | U+E82C |
82359131 | U+9FB8 | U+E832 |
82359132 | U+9FB9 | U+E843 |
82359133 | U+9FBA | U+E854 |
82359134 | U+9FBB | U+E864 |
84318236 | U+FE10 | U+E78D |
84318237 | U+FE11 | U+E78F |
84318238 | U+FE12 | U+E78E |
84318239 | U+FE13 | U+E790 |
84318330 | U+FE14 | U+E791 |
84318331 | U+FE15 | U+E792 |
84318332 | U+FE16 | U+E793 |
84318333 | U+FE17 | U+E794 |
84318334 | U+FE18 | U+E795 |
84318335 | U+FE19 | U+E796 |
In order to provide the compatible behavior, a new system property jdk.charset.GB18030
is introduced. If the value of this property is set to the value "2000
" on the java command line then the mapping of the GB18030
charset will use the old 2000 mapping. Other values will be ignored and the charset defaults to the new 2022 mapping.
Specification
N/A. This is an implementation change in the JDK, not affecting Java SE spec.
- csr of
-
JDK-8301119 Support for GB18030-2022
-
- Resolved
-