-
CSR
-
Resolution: Approved
-
P3
-
None
-
behavioral
-
low
-
Client code that *expects* the code point to be reported as "malformed" will not work with this change, which now is not recommended by the Unicode Consortium corrigendum.
-
Java API
-
SE
Summary
Correct the behavior of UnicodeDecoder subclasses on handling U+FFFE code point in the middle of the input buffer.
Problem
Currently UnicodeDecoder deals with U+FFFE in the middle of a string as "malformed" as it is a non-character. This has been correct up until Unicode 7. However Unicode 7 includes the corrigendum (http://www.unicode.org/versions/corrigendum9.html) that changed the definition of non-characters. UnicodeDecoder's behavior should be modified to conform to it.
Solution
Remove the piece of code in UnicodeDecoder which detects the code point in the middle and return "malformed" CodeResult, so that the UTF16 decoders (StandardCharsets.UTF_16[LE/BE]) can pass through the code point.
Specification
As required by the Unicode 7 Corrigendum 9, U+FFFE is passed through as a code point.
- csr of
-
JDK-8216140 Correct UnicodeDecoder U+FFFE handling
- Closed