Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P4
Fix Version/s: 1.2.2_012
Affects Version/s: 1.2.2
Component/s: core-libs
Labels:
- licbug
- licdirect

Subcomponent:
java.nio.charsets
Resolved In Build:
012
CPU:

sparc
OS:

solaris_8

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-2048677	1.4.1	Ian Little	P4	Closed	Fixed	hopper
JDK-2048676	1.4.0_01	Ian Little	P4	Closed	Fixed	01
JDK-2048675	1.3.1_03	Ian Little	P4	Resolved	Fixed	03
JDK-2048674	1.2.2_12	Ian Little	P4	Resolved	Fixed	12

Name: md23716 Date: 11/02/2001

Problem exists and requires fixing on 1.2.2, 1.3.1 and 1.4.

Encoding a zero'd byte array using the zh_TW locale results in an empty string. Same test with the default locale results in a non-empty string. The EUC_TW encoder is skipping valid zero'd bytes.

Simple testcase :

======================================================================
import java.io.*;

public class Exercise
{
    public static void main(String[] args)
    {
        test("cns11643");
        test("Cp1252");
    }

    public static void test(String encoding)
    {
        String result = null;
        byte[] data = new byte[16];
        int i;

        System.err.println(">>>> " + encoding + " with zero'd byte array");

        for (i = 0; i < 16; i++)
        {
            data[i] = 0;
        }

        try
        {
            result = new String(data, encoding);
            System.err.println("length of string = " + result.length());
        }
        catch (Exception ex)
        {
            ex.printStackTrace();
        }

        for (i=0; i < 16; i++)
        {
            data[i] = (byte)( 32 + i);
        }

        System.err.println(">>>> " + encoding + " with non-zero'd byte array");

        try
        {
            result = new String(data, encoding);
            System.err.println("length of string = " + result.length());
        }
        catch (Exception ex)
        {
            ex.printStackTrace();
        }
    }
}
======================================================================

Suggested Fix :

Looking at the EUC_TW convertor code revealed that a valid character
(the "nil" character) was being used to filter out bad conversions.
Testcase passes when an invalid character (\FFFF) is used instead.

Context diff for ByteToCharEUC_TW.java :

======================================================================***************
*** 61,69 ****
         throws UnknownCharacterException, MalformedInputException,
                ConversionBufferFullException
      {
         int inputSize = 0;
! char outputChar = (char) 0;

         byteOff = inOff;
         charOff = outOff;

--- 61,69 ----
         throws UnknownCharacterException, MalformedInputException,
                ConversionBufferFullException
      {
         int inputSize = 0;
! char outputChar = '\uFFFF'; //ibm@37723

         byteOff = inOff;
         charOff = outOff;

***************
*** 150,158 ****
                break;
             }
             byteOff++;

! if (outputChar != (char) 0) {
                if (outputChar == REPLACE_CHAR) {
                    if (subMode) // substitution enabled
                       outputChar = subChars[0];
                    else {
--- 150,158 ----
                break;
             }
             byteOff++;

! if (outputChar != '\uFFFF') { //ibm@37723
                if (outputChar == REPLACE_CHAR) {
                    if (subMode) // substitution enabled
                       outputChar = subChars[0];
                    else {
***************
*** 160,168 ****
                       throw new UnknownCharacterException();
                    }
                }
                output[charOff++] = outputChar;
! outputChar = 0;
             }
         }

         return charOff - outOff;
--- 160,168 ----
                       throw new UnknownCharacterException();
                    }
                }
                output[charOff++] = outputChar;
! outputChar = '\uFFFF'; //ibm@37723
             }
         }

         return charOff - outOff;
======================================================================

======================================================================

backported by

JDK-2048674 Encoding zero'd byte array using zh_TW locale results in empty string

Resolved

JDK-2048675 Encoding zero'd byte array using zh_TW locale results in empty string

Resolved

JDK-2048676 Encoding zero'd byte array using zh_TW locale results in empty string

Closed

JDK-2048677 Encoding zero'd byte array using zh_TW locale results in empty string

Closed

Assignee:: Ian Little (Inactive)

Reporter:: Michelle Devereux (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2001-11-02 03:03

Updated:: 2002-06-13 10:37

Resolved:: 2002-01-15 10:31

Imported:: 15/Sep/12 1:16 PM

Indexed:: 17/Jul/12 10:49 AM

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates