Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4646959

REGRESSION: UTF-8 decoder silently accepts illegal UTF-8 byte sequences

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P3 P3
    • None
    • 1.4.0
    • core-libs



      Name: gm110360 Date: 03/04/2002


      FULL PRODUCT VERSION :
      java version "1.4.0"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
      Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)

      FULL OPERATING SYSTEM VERSION :
      RedHat Linux 7.1, kernel Linux version 2.4.9-13
      (###@###.###) (gcc version 2.9
      6 20000731 (Red Hat Linux 7.1 2.96-98)) #1 Tue Oct 30
      20:05:14 EST 2001

      ADDITIONAL OPERATING SYSTEMS :
      None


      A DESCRIPTION OF THE PROBLEM :
      In JDK 1.3 when reading ISO 8859-1 text as UTF-8 with the
      InputStreamReader a CharConversionException would be
      thrown to indicate the problem. In JDK 1.4 the text is silently
      accepted, even if the byte sequences are not valid according
      to UTF-8. This means that one ends up with junk input without
      noticing what's wrong.

      REGRESSION. Last worked in version 1.3.1

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Compile and run attached code sample.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      Expected to get a CharConversionException, am not getting
      any exception at all with JDK 1.4.

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      Traceback produced by JDK 1.3:

      [larsga@pc36 java]$ java UTF8Test
      Exception in thread "main" sun.io.MalformedInputException
              at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:152)
              at java.io.InputStreamReader.convertInto(InputStreamReader.java:137)
              at java.io.InputStreamReader.fill(InputStreamReader.java:186)
              at java.io.InputStreamReader.read(InputStreamReader.java:249)
              at java.io.InputStreamReader.read(InputStreamReader.java:222)
              at UTF8Test.main(UTF8Test.java:10)


      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.*;

      public class UTF8Test {

        public static void main(String[] args) throws IOException {
          byte[] input = "d\u00f8lt".getBytes("iso-8859-1");
          InputStream stream = new ByteArrayInputStream(input);
          Reader reader = new InputStreamReader(stream, "utf-8");
          reader.read();
        }
        
      }

      ---------- END SOURCE ----------

      CUSTOMER WORKAROUND :
      Write your own UTF-8 decoder...

      Release Regression From : 1.3.1_02
      The above release value was the last known release where this
      bug was known to work. Since then there has been a regression.

      (Review ID: 143681)
      ======================================================================

            ilittlesunw Ian Little (Inactive)
            gmanwanisunw Girish Manwani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: