Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8301559

Support for GB18030-2022

XMLWordPrintable

    • behavioral
    • medium
    • GB18030 charset will change to include some incompatible mappings by default.
    • System or security property
    • JDK

      Summary

      Support the new GB18030 standard in the JDK

      Problem

      The Chinese government mandates upgrading the GB18030 mappings, but some of these are incompatible with the existing GB18030 mapping in the JDK which is based on the 2000 standard. This incompatibility could cause various character conversion failures, such as file content corruption. Thus a simple replacement of the GB18030 Charset should be avoided.

      Solution

      Replaces the charset implementation for GB18030 based on the new 2022 mapping. The new mapping will swap some code points previously assigned in the private use area to their equivalent code points that Unicode later assigned. Here is the list of all those swapped code points:

      GB18030 byte sequence 2000 mapping 2022 mapping
      A6D9 U+E78D U+FE10
      A6DA U+E78E U+FE12
      A6DB U+E78F U+FE11
      A6DC U+E790 U+FE13
      A6DD U+E791 U+FE14
      A6DE U+E792 U+FE15
      A6DF U+E793 U+FE16
      A6EC U+E794 U+FE17
      A6ED U+E795 U+FE18
      A6F3 U+E796 U+FE19
      A8BC U+E7C7 U+1E3F
      FE59 U+E81E U+9FB4
      FE61 U+E826 U+9FB5
      FE66 U+E82B U+9FB6
      FE67 U+E82C U+9FB7
      FE6D U+E832 U+9FB8
      FE7E U+E843 U+9FB9
      FE90 U+E854 U+9FBA
      FEA0 U+E864 U+9FBB
      8135F437 U+1E3F U+E7C7
      82359037 U+9FB4 U+E81E
      82359038 U+9FB5 U+E826
      82359039 U+9FB6 U+E82B
      82359130 U+9FB7 U+E82C
      82359131 U+9FB8 U+E832
      82359132 U+9FB9 U+E843
      82359133 U+9FBA U+E854
      82359134 U+9FBB U+E864
      84318236 U+FE10 U+E78D
      84318237 U+FE11 U+E78F
      84318238 U+FE12 U+E78E
      84318239 U+FE13 U+E790
      84318330 U+FE14 U+E791
      84318331 U+FE15 U+E792
      84318332 U+FE16 U+E793
      84318333 U+FE17 U+E794
      84318334 U+FE18 U+E795
      84318335 U+FE19 U+E796

      In order to provide the compatible behavior, a new system property jdk.charset.GB18030 is introduced. If the value of this property is set to the value "2000" on the java command line then the mapping of the GB18030 charset will use the old 2000 mapping. Other values will be ignored and the charset defaults to the new 2022 mapping.

      Specification

      N/A. This is an implementation change in the JDK, not affecting Java SE spec.

            naoto Naoto Sato
            naoto Naoto Sato
            Alan Bateman, Lance Andersen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: