Incorrect decoding near EOF for stateful decoders like UTF-16

XMLWordPrintable

    • b12
    • generic
    • generic
    • Verified

        ADDITIONAL SYSTEM INFORMATION :
        Reproduced with 1.8.0_74, 11.0.14, 17.0.3 and 19-ea+34 on Ubuntu 18.04.

        A DESCRIPTION OF THE PROBLEM :
        StreamDecoder.implRead() resets the decoder at [1] after seeing the EOF, but before doing the final decoding round, so any stateful decoder will use its initial state when decoding the remaining input, which can lead to a wrong result.

        The reproducer demonstrates the bug by making the UTF-16 decoder forget the autodetected BOM before decoding the final two bytes.

        [1] https://github.com/openjdk/jdk/blob/ae52053757ca50c4b56989c9b0c6890e504e4088/src/java.base/share/classes/sun/nio/cs/StreamDecoder.java#L381

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        Compile and run the following code.

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 2
                at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
                at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
                at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
                at java.base/sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127)
                at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112)
                at java.base/java.io.InputStreamReader.read(InputStreamReader.java:164)
                at Test.main(Test.java:16)

        ACTUAL -
        216
        -1
        65279
        Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 2
                at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
                at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
                at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
                at java.base/sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127)
                at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112)
                at java.base/java.io.InputStreamReader.read(InputStreamReader.java:164)
                at Test.main(Test.java:23)


        ---------- BEGIN SOURCE ----------
        import java.io.*;
        import java.nio.charset.Charset;
        import java.nio.charset.CodingErrorAction;

        public class Test {
            public static void main(String[] args) throws IOException {
                byte[] input = {
                    (byte) 0xff, (byte) 0xfe, // BOM (in UTF-16LE)
                    0, (byte) 0xd8, // High surrogate (in UTF-16LE)
                };
                InputStreamReader r;

                r = new InputStreamReader(
                        new ByteArrayInputStream(input),
                        Charset.forName("UTF-16").newDecoder().onMalformedInput(CodingErrorAction.REPORT));
                System.out.println(r.read()); // \u00d8 (wrong, uses UTF-16BE)
                System.out.println(r.read()); // EOF

                r = new InputStreamReader(
                        new ByteArrayInputStream(input),
                        Charset.forName("UTF-16LE").newDecoder().onMalformedInput(CodingErrorAction.REPORT));
                System.out.println(r.read()); // BOM
                System.out.println(r.read()); // MalformedInputException (correct)
            }
        }
        ---------- END SOURCE ----------

        FREQUENCY : always


              Assignee:
              Naoto Sato
              Reporter:
              Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: