Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: P4
Fix Version/s: None
Affects Version/s: 8, 9
Component/s: core-libs
Labels:

Subcomponent:
java.nio
CPU:

generic
OS:

generic

FULL PRODUCT VERSION :

A DESCRIPTION OF THE PROBLEM :
sun.nio.cs.UnicodeDecoder incorrectly rejects U+FFFE.

The test at http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/nio/cs/UnicodeDecoder.java#94 should be removed, because contrary to the comment on line 95, a reversed BOM *can* occur in the middle of a stream. The BOM/reversed-BOM are only special at the start of a stream, to distinguish UTF16BE from UTF16LE.

From the unicode.org FAQ (http://www.unicode.org/faq/private_use.html#sentinel6):

Q: I read somewhere that U+FFFE and U+FFFF were illegal in Unicode, and could be used as sentinels. Is that true?
A: Well, the short answer is no, that is not true—at least, not entirely true. U+FFFE and U+FFFF are noncharacters just like the other 64 noncharacters in the standard, and are valid in Unicode strings.

"Unicode 2.0 dropped the explicit prohibition against transmission or storage of U+FFFE and U+FFFF"

Unicode 3.0: "To ensure that round-trip transcoding is possible, a UTF mapping must also map invalid Unicode scalar values to unique code value sequences. These invalid scalar values include U+FFFE, U+FFFF, and unpaired surrogates."

Unicode 4.0: "To ensure that the mapping for a Unicode encoding form is one-to-one, all Unicode scalar values, including those corresponding to noncharacter code points and unassigned code points, must be mapped to unique code unit sequences."

Mapping multiple codepoints to '\uFFFD' as is currently being done in sun.nio.cs.UnicodeDecoder means the encoding is not one-to-one.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
String a = "\uFFFE";
new String(a.getBytes("UTF-16"), "UTF-16") == a;

REPRODUCIBILITY :
This bug can be reproduced always.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

JI9029712.java
0.3 kB
2016-03-28 02:21

relates to

JDK-8150449 "A 'reversed byte-order mark' cannot occur within middle of stream" is not correct

Closed

Assignee:: Pallavi Sonal (Inactive)
Reporter:: Webbug Group
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: 2016-03-27 09:11
Updated:: 2016-03-28 02:23
Resolved:: 2016-03-28 02:23

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates