-
Bug
-
Resolution: Fixed
-
P4
-
5.0
-
None
-
tiger
-
generic
-
solaris_9
The GB18030 charset provides a 4 byte byte sequence scheme/range which
represent its encodes representation of supplementary characters within
Unicode. In J2SE 1.4.1/1.4.2 when attempting to encode surrogate pairs
the GB18030 encoder fails to encode valid surrogate pairs and instead
outputs the default replacement byte (0x3f).
for example:
byte[] ouputBytes = new String("\ud800\udc00").getBytes(@GB18030");
will return a byte array of length = 1 and value = (byte)0x3f
Correct encoding of this surrogate pair should produce a byte array of length=4
with the following 4 values
0x90, 0x30, 0x81, 0x30
Decoding the above values does work.
This needs to be addressed for Tiger as part of the charset
requirements for supplementary character support.
###@###.### 2003-07-25
###@###.### 2003-07-25
###@###.### 2003-07-25
represent its encodes representation of supplementary characters within
Unicode. In J2SE 1.4.1/1.4.2 when attempting to encode surrogate pairs
the GB18030 encoder fails to encode valid surrogate pairs and instead
outputs the default replacement byte (0x3f).
for example:
byte[] ouputBytes = new String("\ud800\udc00").getBytes(@GB18030");
will return a byte array of length = 1 and value = (byte)0x3f
Correct encoding of this surrogate pair should produce a byte array of length=4
with the following 4 values
0x90, 0x30, 0x81, 0x30
Decoding the above values does work.
This needs to be addressed for Tiger as part of the charset
requirements for supplementary character support.
###@###.### 2003-07-25
###@###.### 2003-07-25
###@###.### 2003-07-25