-
Bug
-
Resolution: Fixed
-
P4
-
1.3.0
-
beta
-
generic
-
generic
Name: rlT66838 Date: 06/08/2000
SCSL JDK 1.3 Beta source code (Sep 1999)
This is from the September 1999 JDK 1.3 source release; there is a (faint)
chance that this may have been found and fixed already...
Surrogate pairs are handled correctly if both the high half and the low half
are in the same input[] buffer. However, if a surrogate pair straddles two
input buffers, then it hits two bugs:
First, there is code that does
inputChar = highHalfZoneCode;
highHalfZoneCode = 0;
if (input[inOff] >= 0xdc00 && input[inOff] <= 0xdfff) {
// This is legal UTF16 sequence.
int ucs4 = (highHalfZoneCode - 0xd800) * 0x400
+ (input[inOff] - 0xdc00) + 0x10000;
The ucs4 calculation assumes that highHalfZoneCode still contains the first
half of the surrogate pair, but highHalfZoneCode has been zapped to 0.
Fix: the ucs4 calculation should use inputChar instead of highHalfZoneCode.
Next, it tries to output the ucs4 value:
output[0] = (byte)(0xf0 | ((ucs4 >> 18)) & 0x07);
output[1] = (byte)(0x80 | ((ucs4 >> 12) & 0x3f));
output[2] = (byte)(0x80 | ((ucs4 >> 6) & 0x3f));
output[3] = (byte)(0x80 | (ucs4 & 0x3f));
charOff++;
This should *not* use output[], it should use outputBytes[], then set
outputSize = 4, then execute the logic that occurs further down:
if (byteOff + outputSize > outEnd) {
throw new ConversionBufferFullException();
}
for (int i = 0; i < outputSize; i++) {
output[byteOff++] = outputByte[i];
}
It might also be good for consistency if it set inputSize = 1 and then
did "charOff += inputsize", rather than the current "charOff++", but
that's probably a judgment call.
Also, highHalfZoneCode is redundantly set to 0 again. Not bad, but looks funny.
(Review ID: 105886)
======================================================================
- relates to
-
JDK-4391895 UTF8 Decoder Broken
-
- Resolved
-