Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8217097

Correct UnicodeDecoder U+FFFE handling

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P3 P3
    • 13
    • core-libs
    • None
    • behavioral
    • low
    • Client code that *expects* the code point to be reported as "malformed" will not work with this change, which now is not recommended by the Unicode Consortium corrigendum.
    • Java API
    • SE

      Summary

      Correct the behavior of UnicodeDecoder subclasses on handling U+FFFE code point in the middle of the input buffer.

      Problem

      Currently UnicodeDecoder deals with U+FFFE in the middle of a string as "malformed" as it is a non-character. This has been correct up until Unicode 7. However Unicode 7 includes the corrigendum (http://www.unicode.org/versions/corrigendum9.html) that changed the definition of non-characters. UnicodeDecoder's behavior should be modified to conform to it.

      Solution

      Remove the piece of code in UnicodeDecoder which detects the code point in the middle and return "malformed" CodeResult, so that the UTF16 decoders (StandardCharsets.UTF_16[LE/BE]) can pass through the code point.

      Specification

      As required by the Unicode 7 Corrigendum 9, U+FFFE is passed through as a code point.

            naoto Naoto Sato
            naoto Naoto Sato
            Roger Riggs
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: