-
Bug
-
Resolution: Not an Issue
-
P4
-
None
-
8u66, 9
-
generic
-
generic
FULL PRODUCT VERSION :
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
A DESCRIPTION OF THE PROBLEM :
In reference to: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/nio/cs/UnicodeDecoder.java#94
The comment states "A reversed BOM cannot occur within middle of stream", which has not been true since Unicode 1.0, *before* the introduction of UTF-16. (http://www.unicode.org/faq/private_use.html#sentinel6)
To fix this bug, the corresponding test & error return should simply be removed.
From Unicode 3.0, Chapter 3, p. 46:
To ensure that round-trip transcoding is possible, a UTF mapping must also map invalid Unicode scalar values to unique code value sequences. These invalid scalar values include FFFE, FFFF, and unpaired surrogates.
and clarified in Unicode 4.0:
To ensure that the mapping for a Unicode encoding form is one-to-one, all Unicode scalar values, including those corresponding to noncharacter code points and unassigned code points, must be mapped to unique code unit sequences. Note that this requirement does not extend to high-surrogate and low-surrogate code points, which are excluded by definition from the set of Unicode scalar values.
http://www.unicode.org/faq/utf_bom.html#utf16-7
http://www.unicode.org/faq/utf_bom.html#utf16-8
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
new String("\ufffe".getBytes("UTF-16"), "UTF-16").equals("\ufffe")
REPRODUCIBILITY :
This bug can be reproduced always.
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
A DESCRIPTION OF THE PROBLEM :
In reference to: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/nio/cs/UnicodeDecoder.java#94
The comment states "A reversed BOM cannot occur within middle of stream", which has not been true since Unicode 1.0, *before* the introduction of UTF-16. (http://www.unicode.org/faq/private_use.html#sentinel6)
To fix this bug, the corresponding test & error return should simply be removed.
From Unicode 3.0, Chapter 3, p. 46:
To ensure that round-trip transcoding is possible, a UTF mapping must also map invalid Unicode scalar values to unique code value sequences. These invalid scalar values include FFFE, FFFF, and unpaired surrogates.
and clarified in Unicode 4.0:
To ensure that the mapping for a Unicode encoding form is one-to-one, all Unicode scalar values, including those corresponding to noncharacter code points and unassigned code points, must be mapped to unique code unit sequences. Note that this requirement does not extend to high-surrogate and low-surrogate code points, which are excluded by definition from the set of Unicode scalar values.
http://www.unicode.org/faq/utf_bom.html#utf16-7
http://www.unicode.org/faq/utf_bom.html#utf16-8
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
new String("\ufffe".getBytes("UTF-16"), "UTF-16").equals("\ufffe")
REPRODUCIBILITY :
This bug can be reproduced always.
- relates to
-
JDK-8152841 sun.nio.cs.UnicodeDecoder incorrectly rejects U+FFFE
- Closed
-
JDK-8216140 Correct UnicodeDecoder U+FFFE handling
- Closed