Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4896454

GB18030 encoding of surrogates not currently available with sun.nio.cs.ext.GB180

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 5.0
    • 5.0
    • core-libs
    • None
    • tiger
    • generic
    • solaris_9

      The GB18030 charset provides a 4 byte byte sequence scheme/range which
      represent its encodes representation of supplementary characters within
      Unicode. In J2SE 1.4.1/1.4.2 when attempting to encode surrogate pairs
      the GB18030 encoder fails to encode valid surrogate pairs and instead
      outputs the default replacement byte (0x3f).

      for example:
           byte[] ouputBytes = new String("\ud800\udc00").getBytes(@GB18030");

      will return a byte array of length = 1 and value = (byte)0x3f

      Correct encoding of this surrogate pair should produce a byte array of length=4
      with the following 4 values
      0x90, 0x30, 0x81, 0x30

      Decoding the above values does work.

      This needs to be addressed for Tiger as part of the charset
      requirements for supplementary character support.
      ###@###.### 2003-07-25
      ###@###.### 2003-07-25
      ###@###.### 2003-07-25

            busersunw Btplusnull User (Inactive)
            ilittlesunw Ian Little (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: