Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8292043

Incorrect decoding near EOF for stateful decoders like UTF-16

XMLWordPrintable

    • b12
    • generic
    • generic
    • Verified

      ADDITIONAL SYSTEM INFORMATION :
      Reproduced with 1.8.0_74, 11.0.14, 17.0.3 and 19-ea+34 on Ubuntu 18.04.

      A DESCRIPTION OF THE PROBLEM :
      StreamDecoder.implRead() resets the decoder at [1] after seeing the EOF, but before doing the final decoding round, so any stateful decoder will use its initial state when decoding the remaining input, which can lead to a wrong result.

      The reproducer demonstrates the bug by making the UTF-16 decoder forget the autodetected BOM before decoding the final two bytes.

      [1] https://github.com/openjdk/jdk/blob/ae52053757ca50c4b56989c9b0c6890e504e4088/src/java.base/share/classes/sun/nio/cs/StreamDecoder.java#L381

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Compile and run the following code.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 2
              at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
              at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
              at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
              at java.base/sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127)
              at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112)
              at java.base/java.io.InputStreamReader.read(InputStreamReader.java:164)
              at Test.main(Test.java:16)

      ACTUAL -
      216
      -1
      65279
      Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 2
              at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
              at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
              at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
              at java.base/sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127)
              at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112)
              at java.base/java.io.InputStreamReader.read(InputStreamReader.java:164)
              at Test.main(Test.java:23)


      ---------- BEGIN SOURCE ----------
      import java.io.*;
      import java.nio.charset.Charset;
      import java.nio.charset.CodingErrorAction;

      public class Test {
          public static void main(String[] args) throws IOException {
              byte[] input = {
                  (byte) 0xff, (byte) 0xfe, // BOM (in UTF-16LE)
                  0, (byte) 0xd8, // High surrogate (in UTF-16LE)
              };
              InputStreamReader r;

              r = new InputStreamReader(
                      new ByteArrayInputStream(input),
                      Charset.forName("UTF-16").newDecoder().onMalformedInput(CodingErrorAction.REPORT));
              System.out.println(r.read()); // \u00d8 (wrong, uses UTF-16BE)
              System.out.println(r.read()); // EOF

              r = new InputStreamReader(
                      new ByteArrayInputStream(input),
                      Charset.forName("UTF-16LE").newDecoder().onMalformedInput(CodingErrorAction.REPORT));
              System.out.println(r.read()); // BOM
              System.out.println(r.read()); // MalformedInputException (correct)
          }
      }
      ---------- END SOURCE ----------

      FREQUENCY : always


            naoto Naoto Sato
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: