-
Bug
-
Resolution: Fixed
-
P3
-
1.1.6
-
None
-
1.2beta4
-
generic
-
solaris_2.5.1
-
Not verified
The attached Java program demonstrates a basic bug in many of the CharToByte
converters in the Asian locales, many are broken in JDK1.1.3, and fewer in
1.1.6. I did not try them all, but someone should.
It appears that the convert() method is bumping the charOff field before
it's determined that the byte buffer has the room, this causes the
OutputStreamWriter() to drop characters because the convert() method
has messed up the charOff index. The end result is that every 8192
bytes into the file, a character is lost.
The attached program demonstrates the bug which shows up in at least the
encodings: Big5, CNS11643, GB2312, and KSC5601.
On jdk 1.1.6 it appears that CNS11643 continues to be broken.
This is a serious problem in that any Java application using these standard
classes will potentially drop characters from a user's file.
-kto
kelly.ohair@Eng 1998-03-25
[xueming.shen@Japan 1998-03-26]
I have run attached test problem against all the converters under
sun/io, found the following error message.
ERROR with encoding EUC_TW: Data wrong at position 8192 in: 65 out: 0
ERROR with encoding ISO2022KR: Data wrong at position 8192 in: 65 out: 0
TestingMS932==>Exception thrown on write:java.io.UnsupportedEncodingException
ERROR with encoding MacDingbat: Data wrong at position 0 in: 65 out: 10015
ERROR with encoding MacSymbol: Data wrong at position 0 in: 65 out: 63
Exceptions:
(1)ISO2022CN has not been tested
(2)Replaced the test target date to constant character 'A'
Converters tested:
tryEncoding("ASCII");
tryEncoding("Big5");
tryEncoding("Cp037");
tryEncoding("Cp1006");
tryEncoding("Cp1025");
tryEncoding("Cp1026");
tryEncoding("Cp1046");
tryEncoding("Cp1097");
tryEncoding("Cp1098");
tryEncoding("Cp1112");
tryEncoding("Cp1122");
tryEncoding("Cp1123");
tryEncoding("Cp1124");
tryEncoding("Cp1250");
tryEncoding("Cp1251");
tryEncoding("Cp1252");
tryEncoding("Cp1253");
tryEncoding("Cp1254");
tryEncoding("Cp1255");
tryEncoding("Cp1256");
tryEncoding("Cp1257");
tryEncoding("Cp1258");
tryEncoding("Cp1381");
tryEncoding("Cp1383");
tryEncoding("Cp273");
tryEncoding("Cp277");
tryEncoding("Cp278");
tryEncoding("Cp280");
tryEncoding("Cp284");
tryEncoding("Cp285");
tryEncoding("Cp297");
tryEncoding("Cp33722");
tryEncoding("Cp420");
tryEncoding("Cp424");
tryEncoding("Cp437");
tryEncoding("Cp500");
tryEncoding("Cp737");
tryEncoding("Cp775");
tryEncoding("Cp838");
tryEncoding("Cp850");
tryEncoding("Cp852");
tryEncoding("Cp855");
tryEncoding("Cp856");
tryEncoding("Cp857");
tryEncoding("Cp860");
tryEncoding("Cp861");
tryEncoding("Cp862");
tryEncoding("Cp863");
tryEncoding("Cp864");
tryEncoding("Cp865");
tryEncoding("Cp866");
tryEncoding("Cp868");
tryEncoding("Cp869");
tryEncoding("Cp870");
tryEncoding("Cp871");
tryEncoding("Cp874");
tryEncoding("Cp875");
tryEncoding("Cp918");
tryEncoding("Cp921");
tryEncoding("Cp922");
tryEncoding("Cp930");
tryEncoding("Cp933");
tryEncoding("Cp935");
tryEncoding("Cp937");
tryEncoding("Cp939");
tryEncoding("Cp942");
tryEncoding("Cp942C");
tryEncoding("Cp943");
tryEncoding("Cp943C");
tryEncoding("Cp948");
tryEncoding("Cp949");
tryEncoding("Cp949C");
tryEncoding("Cp950");
tryEncoding("Cp964");
tryEncoding("Cp970");
tryEncoding("EUC_CN");
tryEncoding("EUC_JP");
tryEncoding("EUC_KR");
tryEncoding("EUC_TW");
tryEncoding("GBK");
tryEncoding("ISO2022CN");
tryEncoding("ISO2022JP");
tryEncoding("ISO2022KR");
tryEncoding("ISO8859_1");
tryEncoding("ISO8859_2");
tryEncoding("ISO8859_3");
tryEncoding("ISO8859_4");
tryEncoding("ISO8859_5");
tryEncoding("ISO8859_6");
tryEncoding("ISO8859_7");
tryEncoding("ISO8859_8");
tryEncoding("ISO8859_9");
tryEncoding("Johab");
tryEncoding("KOI8_R");
tryEncoding("MS874");
tryEncoding("MS932");
tryEncoding("MS936");
tryEncoding("MS950");
tryEncoding("MacArabic");
tryEncoding("MacCentralEurope");
tryEncoding("MacCroatian");
tryEncoding("MacCyrillic");
tryEncoding("MacDingbat");
tryEncoding("MacGreek");
tryEncoding("MacHebrew");
tryEncoding("MacIceland");
tryEncoding("MacRoman");
tryEncoding("MacRomania");
tryEncoding("MacSymbol");
tryEncoding("MacThai");
tryEncoding("MacTurkish");
tryEncoding("MacUkraine");
tryEncoding("SJIS");
tryEncoding("TIS620");
tryEncoding("UTF8");
tryEncoding("Unicode");
tryEncoding("UnicodeBig");
tryEncoding("UnicodeLittle");
[xueming.shen@Japan 1998-03-27]
After put the suggested fix for EUC_TW, ISO2022, test without using
constant character 'A', I continuely got following need futher
investigation (guess these are caused by the incorrect input in the
test case, but if you have time please take a sanity check)
ERROR with encoding Cp1097: Data wrong at position 93 in: 94 out: 26
ERROR with encoding Cp420: Data wrong at position 90 in: 91 out: 26
ERROR with encoding Cp864: Data wrong at position 36 in: 37 out: 63
ERROR with encoding ISO8859_6: Data wrong at position 47 in: 48 out: 63
I have check the following converters and found that the reasons
put them on the "error" list are not due to a bug, but a correct
behaviors.
(1)the behavior of Cp930,Cp933,Cp935,Cp937,Cp939 is correct when dealing
with SO 0x
(2)MacDingbat and MacSymbol are OK.
- relates to
-
JDK-4277317 Regression test: sun/io/Converter/TestConverterDroppedCharaters.java failing
-
- Closed
-