-
Bug
-
Resolution: Fixed
-
P4
-
8, 11, 17, 18, 19, 20
-
b12
-
generic
-
generic
-
Verified
ADDITIONAL SYSTEM INFORMATION :
Reproduced with 1.8.0_74, 11.0.14, 17.0.3 and 19-ea+34 on Ubuntu 18.04.
A DESCRIPTION OF THE PROBLEM :
StreamDecoder.implRead() resets the decoder at [1] after seeing the EOF, but before doing the final decoding round, so any stateful decoder will use its initial state when decoding the remaining input, which can lead to a wrong result.
The reproducer demonstrates the bug by making the UTF-16 decoder forget the autodetected BOM before decoding the final two bytes.
[1] https://github.com/openjdk/jdk/blob/ae52053757ca50c4b56989c9b0c6890e504e4088/src/java.base/share/classes/sun/nio/cs/StreamDecoder.java#L381
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the following code.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 2
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:164)
at Test.main(Test.java:16)
ACTUAL -
216
-1
65279
Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 2
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:164)
at Test.main(Test.java:23)
---------- BEGIN SOURCE ----------
import java.io.*;
import java.nio.charset.Charset;
import java.nio.charset.CodingErrorAction;
public class Test {
public static void main(String[] args) throws IOException {
byte[] input = {
(byte) 0xff, (byte) 0xfe, // BOM (in UTF-16LE)
0, (byte) 0xd8, // High surrogate (in UTF-16LE)
};
InputStreamReader r;
r = new InputStreamReader(
new ByteArrayInputStream(input),
Charset.forName("UTF-16").newDecoder().onMalformedInput(CodingErrorAction.REPORT));
System.out.println(r.read()); // \u00d8 (wrong, uses UTF-16BE)
System.out.println(r.read()); // EOF
r = new InputStreamReader(
new ByteArrayInputStream(input),
Charset.forName("UTF-16LE").newDecoder().onMalformedInput(CodingErrorAction.REPORT));
System.out.println(r.read()); // BOM
System.out.println(r.read()); // MalformedInputException (correct)
}
}
---------- END SOURCE ----------
FREQUENCY : always
Reproduced with 1.8.0_74, 11.0.14, 17.0.3 and 19-ea+34 on Ubuntu 18.04.
A DESCRIPTION OF THE PROBLEM :
StreamDecoder.implRead() resets the decoder at [1] after seeing the EOF, but before doing the final decoding round, so any stateful decoder will use its initial state when decoding the remaining input, which can lead to a wrong result.
The reproducer demonstrates the bug by making the UTF-16 decoder forget the autodetected BOM before decoding the final two bytes.
[1] https://github.com/openjdk/jdk/blob/ae52053757ca50c4b56989c9b0c6890e504e4088/src/java.base/share/classes/sun/nio/cs/StreamDecoder.java#L381
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the following code.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 2
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:164)
at Test.main(Test.java:16)
ACTUAL -
216
-1
65279
Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 2
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:164)
at Test.main(Test.java:23)
---------- BEGIN SOURCE ----------
import java.io.*;
import java.nio.charset.Charset;
import java.nio.charset.CodingErrorAction;
public class Test {
public static void main(String[] args) throws IOException {
byte[] input = {
(byte) 0xff, (byte) 0xfe, // BOM (in UTF-16LE)
0, (byte) 0xd8, // High surrogate (in UTF-16LE)
};
InputStreamReader r;
r = new InputStreamReader(
new ByteArrayInputStream(input),
Charset.forName("UTF-16").newDecoder().onMalformedInput(CodingErrorAction.REPORT));
System.out.println(r.read()); // \u00d8 (wrong, uses UTF-16BE)
System.out.println(r.read()); // EOF
r = new InputStreamReader(
new ByteArrayInputStream(input),
Charset.forName("UTF-16LE").newDecoder().onMalformedInput(CodingErrorAction.REPORT));
System.out.println(r.read()); // BOM
System.out.println(r.read()); // MalformedInputException (correct)
}
}
---------- END SOURCE ----------
FREQUENCY : always