Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6982052

utf-8 decoder allows for directly encoded trail surrogates

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P4 P4
    • None
    • 6
    • core-libs

      FULL PRODUCT VERSION :
      java version "1.6.0_19"
      Java(TM) SE Runtime Environment (build 1.6.0_19-b04)
      Java HotSpot(TM) Client VM (build 16.2-b04, mixed mode, sharing)

      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows [Version 6.0.6002]

      A DESCRIPTION OF THE PROBLEM :
      utf-8 decoder allows for directly encoded trail surrogates

      For example: the sequence 0xed 0xba 0xab is decoded to "\uDEAB"


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      I expect the behavior specified by CodingErrorAction (replace, ignore, report) to be triggered in this case. I do not expect a trail surrogate, it is not legal according to Unicode:

      When a process interprets a code unit sequence which purports to be in a Unicode character encoding form, it shall treat ill-formed code unit sequences as an error condition and shall not interpret such sequences as characters.

      Because surrogate code points are not Unicode scalar values, any UTF-8 byte sequence that would otherwise map to code points D800..DFFF is ill-formed.

      ACTUAL -
      the decoder instead accepts the invalid byte sequence (it does not matter what you set CodingErrorAction to), and converts it to a trail surrogate.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      // decoding this should not yield the trail surrogate itself

      public void test() throws Exception {
          byte[] invalid = new byte[] { (byte)0xed, (byte)0xba, (byte)0xab };
          assertFalse(new String(invalid, 0, invalid.length, "UTF-8").equals("\uDEAB"));
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Write your own decoder.

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: