Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P4
Fix Version/s: 1.4.0
Affects Version/s: 1.3.0
Component/s: core-libs
Labels:
- webbug

Subcomponent:
java.nio.charsets
Resolved In Build:
beta
CPU:

generic
OS:

generic

Name: rlT66838 Date: 06/08/2000

SCSL JDK 1.3 Beta source code (Sep 1999)

This is from the September 1999 JDK 1.3 source release; there is a (faint)
chance that this may have been found and fixed already...

Surrogate pairs are handled correctly if both the high half and the low half
are in the same input[] buffer. However, if a surrogate pair straddles two
input buffers, then it hits two bugs:

First, there is code that does

            inputChar = highHalfZoneCode;
            highHalfZoneCode = 0;
            if (input[inOff] >= 0xdc00 && input[inOff] <= 0xdfff) {
                // This is legal UTF16 sequence.
                int ucs4 = (highHalfZoneCode - 0xd800) * 0x400
                    + (input[inOff] - 0xdc00) + 0x10000;

The ucs4 calculation assumes that highHalfZoneCode still contains the first
half of the surrogate pair, but highHalfZoneCode has been zapped to 0.

  Fix: the ucs4 calculation should use inputChar instead of highHalfZoneCode.

Next, it tries to output the ucs4 value:

                output[0] = (byte)(0xf0 | ((ucs4 >> 18)) & 0x07);
                output[1] = (byte)(0x80 | ((ucs4 >> 12) & 0x3f));
                output[2] = (byte)(0x80 | ((ucs4 >> 6) & 0x3f));
                output[3] = (byte)(0x80 | (ucs4 & 0x3f));
                charOff++;

This should *not* use output[], it should use outputBytes[], then set
outputSize = 4, then execute the logic that occurs further down:

            if (byteOff + outputSize > outEnd) {
                throw new ConversionBufferFullException();
            }
            for (int i = 0; i < outputSize; i++) {
                output[byteOff++] = outputByte[i];
            }

It might also be good for consistency if it set inputSize = 1 and then
did "charOff += inputsize", rather than the current "charOff++", but
that's probably a judgment call.

Also, highHalfZoneCode is redundantly set to 0 again. Not bad, but looks funny.
(Review ID: 105886)
======================================================================

relates to

JDK-4391895 UTF8 Decoder Broken

Resolved

Assignee:: Ian Little (Inactive)

Reporter:: Roger Lewis (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2000-06-08 11:04

Updated:: 2000-12-19 16:00

Resolved:: 2000-12-19 16:00

Imported:: 15/Sep/12 1:15 PM

Indexed:: 17/Jul/12 10:48 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates