Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6372100

CharsetDecoder.decode fails for single-byte input for many CJK encodings

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 5.0
    • core-libs

      FULL PRODUCT VERSION :
      java version "1.5.0_06"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
      Java HotSpot(TM) Client VM (build 1.5.0_06-b05, mixed mode, sharing)


      ADDITIONAL OS VERSION INFORMATION :
      Linux honolulu.ilog.fr 2.4.21-0.13mdk #1 Fri Mar 14 15:08:06 EST 2003 i686 unknown


      A DESCRIPTION OF THE PROBLEM :
      For many CJK encodings, trying to decode a single-byte input buffer
      yields 0-characters output. It should yield a 1-character output for
      bytes in the ASCII range (0 to 0x7f) and a MalformedInputException for
      bytes from 0x80 to 0xff.


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      javac niobug1.java
      java niobug1


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      No output.

      ACTUAL -
      Charset Big5_HKSCS: 256 errors
      Charset Big5_Solaris: 256 errors
      Charset Big5: 256 errors
      Charset Cp1381: 256 errors
      Charset Cp1383: 256 errors
      Charset Cp930: 256 errors
      Charset Cp933: 256 errors
      Charset Cp935: 256 errors
      Charset Cp937: 256 errors
      Charset Cp939: 256 errors
      Charset Cp942: 256 errors
      Charset Cp942C: 256 errors
      Charset Cp943: 256 errors
      Charset Cp943C: 256 errors
      Charset Cp948: 256 errors
      Charset Cp949: 256 errors
      Charset Cp949C: 256 errors
      Charset Cp950: 256 errors
      Charset Cp970: 256 errors
      Charset EUC_CN: 256 errors
      Charset EUC_JP_Solaris: 256 errors
      Charset EUC_JP: 256 errors
      Charset EUC_KR: 256 errors
      Charset GBK: 256 errors
      Charset JIS0208: 256 errors
      Charset JIS0212: 256 errors
      Charset Johab: 256 errors
      Charset MS932: 256 errors
      Charset MS936: 256 errors
      Charset MS949: 256 errors
      Charset MS950_HKSCS: 256 errors
      Charset MS950: 256 errors
      Charset PCK: 256 errors
      Charset SJIS: 256 errors


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.*;
      import java.nio.*;
      import java.nio.charset.*;

      public class niobug1 {
        public static void main (String[] args) throws CharacterCodingException {
          String[] encodings = {
            "Big5_HKSCS",
            "Big5_Solaris",
            "Big5",
            "Cp1381",
            "Cp1383",
            "Cp930",
            "Cp933",
            "Cp935",
            "Cp937",
            "Cp939",
            "Cp942",
            "Cp942C",
            "Cp943",
            "Cp943C",
            "Cp948",
            "Cp949",
            "Cp949C",
            "Cp950",
            "Cp970",
            "EUC_CN",
            "EUC_JP_Solaris",
            "EUC_JP",
            "EUC_KR",
            "GBK",
            "JIS0208",
            "JIS0212",
            "Johab",
            "MS932",
            "MS936",
            "MS949",
            "MS950_HKSCS",
            "MS950",
            "PCK",
            "SJIS",
          };
          for (int n = 0; n < encodings.length; n++) {
            String encoding = encodings[n];
            Charset charset = Charset.forName(encoding);
            CharsetDecoder converter = charset.newDecoder();
            converter = converter.onMalformedInput(CodingErrorAction.REPORT);
            converter = converter.onUnmappableCharacter(CodingErrorAction.REPORT);
            int errors = 0;
            for (int b = 0; b < 0x100; b++) {
              ByteBuffer in = ByteBuffer.wrap(new byte[] { (byte)b });
              try {
                CharBuffer out = converter.decode(in);
                if (out.length() == 0)
                  errors++;
              } catch (MalformedInputException e) {
              }
            }
            if (errors > 0)
              System.err.println("Charset "+encoding+": "+errors+" errors");
          }
        }
      }

      ---------- END SOURCE ----------

            sherman Xueming Shen
            rmandalasunw Ranjith Mandala (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: