Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4344266

Inconsistent CharToByteConverter behaviour for surrogate pairs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: P4 P4
    • None
    • 1.3.0
    • core-libs

      Name: rlT66838 Date: 06/08/2000


      SCSL JDK 1.3 Beta source code (Sep 1999)


      This is from the September 1999 JDK 1.3 source release; there is a (faint)
      chance that this may have been found and fixed already...

      The handling of surrogate pairs varies pretty randomly between the different
      CharToByteConverter subclasses:

      CharToByteASCII throws UnknownCharacterException if a surrogate pair
      straddles invocations of convert(), whereas within a single invocation
      of convert() it will do optional substitution (good). It also rejects
      unaccompanied low surrogates (good).

      CharToByteISO8859_1 does everything right

      CharToByteSingleByte is like CharToByteASCII, i.e. it rejects surrogates
      that straddle invocations, rather than doing optional substitution.

      CharToByteUTF8 tries to handle surrogates that straddle invocations,
      but gets it wrong -- see bug report with internal review ID of: 105886

      CharToByteUTF8 is also the only one that doesn't check for and reject
      unaccompanied low surrogates; it just treats them like standard unicode
      characters and generates an illegal UTF-8 encoding for them.

      CharToByteUnicode is fine, it doesn't need to worry about surrogates.
      (Well, ideally it should check that there are no unaccompanied low
      surrogates, and no dangling high surrogates at the end of input, but
      it's probably good enough).
      (Review ID: 105889)
      ======================================================================

            sherman Xueming Shen
            rlewis Roger Lewis (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: