-
Bug
-
Resolution: Won't Fix
-
P4
-
None
-
1.3.0
-
generic
-
generic
Name: rlT66838 Date: 06/08/2000
SCSL JDK 1.3 Beta source code (Sep 1999)
This is from the September 1999 JDK 1.3 source release; there is a (faint)
chance that this may have been found and fixed already...
The handling of surrogate pairs varies pretty randomly between the different
CharToByteConverter subclasses:
CharToByteASCII throws UnknownCharacterException if a surrogate pair
straddles invocations of convert(), whereas within a single invocation
of convert() it will do optional substitution (good). It also rejects
unaccompanied low surrogates (good).
CharToByteISO8859_1 does everything right
CharToByteSingleByte is like CharToByteASCII, i.e. it rejects surrogates
that straddle invocations, rather than doing optional substitution.
CharToByteUTF8 tries to handle surrogates that straddle invocations,
but gets it wrong -- see bug report with internal review ID of: 105886
CharToByteUTF8 is also the only one that doesn't check for and reject
unaccompanied low surrogates; it just treats them like standard unicode
characters and generates an illegal UTF-8 encoding for them.
CharToByteUnicode is fine, it doesn't need to worry about surrogates.
(Well, ideally it should check that there are no unaccompanied low
surrogates, and no dangling high surrogates at the end of input, but
it's probably good enough).
(Review ID: 105889)
======================================================================
SCSL JDK 1.3 Beta source code (Sep 1999)
This is from the September 1999 JDK 1.3 source release; there is a (faint)
chance that this may have been found and fixed already...
The handling of surrogate pairs varies pretty randomly between the different
CharToByteConverter subclasses:
CharToByteASCII throws UnknownCharacterException if a surrogate pair
straddles invocations of convert(), whereas within a single invocation
of convert() it will do optional substitution (good). It also rejects
unaccompanied low surrogates (good).
CharToByteISO8859_1 does everything right
CharToByteSingleByte is like CharToByteASCII, i.e. it rejects surrogates
that straddle invocations, rather than doing optional substitution.
CharToByteUTF8 tries to handle surrogates that straddle invocations,
but gets it wrong -- see bug report with internal review ID of: 105886
CharToByteUTF8 is also the only one that doesn't check for and reject
unaccompanied low surrogates; it just treats them like standard unicode
characters and generates an illegal UTF-8 encoding for them.
CharToByteUnicode is fine, it doesn't need to worry about surrogates.
(Well, ideally it should check that there are no unaccompanied low
surrogates, and no dangling high surrogates at the end of input, but
it's probably good enough).
(Review ID: 105889)
======================================================================