Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4122961

CharToByte converters are dropping characters on buffer boundaries

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P3 P3
    • 1.2.0
    • 1.1.6
    • core-libs
    • None
    • 1.2beta4
    • generic
    • solaris_2.5.1
    • Not verified


      The attached Java program demonstrates a basic bug in many of the CharToByte
      converters in the Asian locales, many are broken in JDK1.1.3, and fewer in
      1.1.6. I did not try them all, but someone should.
      It appears that the convert() method is bumping the charOff field before
      it's determined that the byte buffer has the room, this causes the
      OutputStreamWriter() to drop characters because the convert() method
      has messed up the charOff index. The end result is that every 8192
      bytes into the file, a character is lost.

      The attached program demonstrates the bug which shows up in at least the
      encodings: Big5, CNS11643, GB2312, and KSC5601.
      On jdk 1.1.6 it appears that CNS11643 continues to be broken.

      This is a serious problem in that any Java application using these standard
      classes will potentially drop characters from a user's file.

      -kto

      kelly.ohair@Eng 1998-03-25



      [xueming.shen@Japan 1998-03-26]
      I have run attached test problem against all the converters under
      sun/io, found the following error message.

      ERROR with encoding EUC_TW: Data wrong at position 8192 in: 65 out: 0
      ERROR with encoding ISO2022KR: Data wrong at position 8192 in: 65 out: 0
      TestingMS932==>Exception thrown on write:java.io.UnsupportedEncodingException
      ERROR with encoding MacDingbat: Data wrong at position 0 in: 65 out: 10015
      ERROR with encoding MacSymbol: Data wrong at position 0 in: 65 out: 63

      Exceptions:
      (1)ISO2022CN has not been tested
      (2)Replaced the test target date to constant character 'A'

      Converters tested:
      tryEncoding("ASCII");
      tryEncoding("Big5");
      tryEncoding("Cp037");
      tryEncoding("Cp1006");
      tryEncoding("Cp1025");
      tryEncoding("Cp1026");
      tryEncoding("Cp1046");
      tryEncoding("Cp1097");
      tryEncoding("Cp1098");
      tryEncoding("Cp1112");
      tryEncoding("Cp1122");
      tryEncoding("Cp1123");
      tryEncoding("Cp1124");
      tryEncoding("Cp1250");
      tryEncoding("Cp1251");
      tryEncoding("Cp1252");
      tryEncoding("Cp1253");
      tryEncoding("Cp1254");
      tryEncoding("Cp1255");
      tryEncoding("Cp1256");
      tryEncoding("Cp1257");
      tryEncoding("Cp1258");
      tryEncoding("Cp1381");
      tryEncoding("Cp1383");
      tryEncoding("Cp273");
      tryEncoding("Cp277");
      tryEncoding("Cp278");
      tryEncoding("Cp280");
      tryEncoding("Cp284");
      tryEncoding("Cp285");
      tryEncoding("Cp297");
      tryEncoding("Cp33722");
      tryEncoding("Cp420");
      tryEncoding("Cp424");
      tryEncoding("Cp437");
      tryEncoding("Cp500");
      tryEncoding("Cp737");
      tryEncoding("Cp775");
      tryEncoding("Cp838");
      tryEncoding("Cp850");
      tryEncoding("Cp852");
      tryEncoding("Cp855");
      tryEncoding("Cp856");
      tryEncoding("Cp857");
      tryEncoding("Cp860");
      tryEncoding("Cp861");
      tryEncoding("Cp862");
      tryEncoding("Cp863");
      tryEncoding("Cp864");
      tryEncoding("Cp865");
      tryEncoding("Cp866");
      tryEncoding("Cp868");
      tryEncoding("Cp869");
      tryEncoding("Cp870");
      tryEncoding("Cp871");
      tryEncoding("Cp874");
      tryEncoding("Cp875");
      tryEncoding("Cp918");
      tryEncoding("Cp921");
      tryEncoding("Cp922");
      tryEncoding("Cp930");
      tryEncoding("Cp933");
      tryEncoding("Cp935");
      tryEncoding("Cp937");
      tryEncoding("Cp939");
      tryEncoding("Cp942");
      tryEncoding("Cp942C");
      tryEncoding("Cp943");
      tryEncoding("Cp943C");
      tryEncoding("Cp948");
      tryEncoding("Cp949");
      tryEncoding("Cp949C");
      tryEncoding("Cp950");
      tryEncoding("Cp964");
      tryEncoding("Cp970");
      tryEncoding("EUC_CN");
      tryEncoding("EUC_JP");
      tryEncoding("EUC_KR");
      tryEncoding("EUC_TW");
      tryEncoding("GBK");
      tryEncoding("ISO2022CN");
      tryEncoding("ISO2022JP");
      tryEncoding("ISO2022KR");
      tryEncoding("ISO8859_1");
      tryEncoding("ISO8859_2");
      tryEncoding("ISO8859_3");
      tryEncoding("ISO8859_4");
      tryEncoding("ISO8859_5");
      tryEncoding("ISO8859_6");
      tryEncoding("ISO8859_7");
      tryEncoding("ISO8859_8");
      tryEncoding("ISO8859_9");
      tryEncoding("Johab");
      tryEncoding("KOI8_R");
      tryEncoding("MS874");
      tryEncoding("MS932");
      tryEncoding("MS936");
      tryEncoding("MS950");
      tryEncoding("MacArabic");
      tryEncoding("MacCentralEurope");
      tryEncoding("MacCroatian");
      tryEncoding("MacCyrillic");
      tryEncoding("MacDingbat");
      tryEncoding("MacGreek");
      tryEncoding("MacHebrew");
      tryEncoding("MacIceland");
      tryEncoding("MacRoman");
      tryEncoding("MacRomania");
      tryEncoding("MacSymbol");
      tryEncoding("MacThai");
      tryEncoding("MacTurkish");
      tryEncoding("MacUkraine");
      tryEncoding("SJIS");
      tryEncoding("TIS620");
      tryEncoding("UTF8");
      tryEncoding("Unicode");
      tryEncoding("UnicodeBig");
      tryEncoding("UnicodeLittle");


      [xueming.shen@Japan 1998-03-27]

      After put the suggested fix for EUC_TW, ISO2022, test without using
      constant character 'A', I continuely got following need futher
      investigation (guess these are caused by the incorrect input in the
      test case, but if you have time please take a sanity check)

      ERROR with encoding Cp1097: Data wrong at position 93 in: 94 out: 26
      ERROR with encoding Cp420: Data wrong at position 90 in: 91 out: 26
      ERROR with encoding Cp864: Data wrong at position 36 in: 37 out: 63
      ERROR with encoding ISO8859_6: Data wrong at position 47 in: 48 out: 63

      I have check the following converters and found that the reasons
      put them on the "error" list are not due to a bug, but a correct
      behaviors.

      (1)the behavior of Cp930,Cp933,Cp935,Cp937,Cp939 is correct when dealing
         with SO 0x
      (2)MacDingbat and MacSymbol are OK.


            bcbeck Brian Beck (Inactive)
            ohair Kelly Ohair (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: