Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4522270

Encoding zero'd byte array using zh_TW locale results in empty string

XMLWordPrintable



        Name: md23716 Date: 11/02/2001

        Problem exists and requires fixing on 1.2.2, 1.3.1 and 1.4.

        Encoding a zero'd byte array using the zh_TW locale results in an empty string. Same test with the default locale results in a non-empty string. The EUC_TW encoder is skipping valid zero'd bytes.

        Simple testcase :

        ======================================================================
        import java.io.*;

        public class Exercise
        {
            public static void main(String[] args)
            {
                test("cns11643");
                test("Cp1252");
            }

            public static void test(String encoding)
            {
                String result = null;
                byte[] data = new byte[16];
                int i;

                System.err.println(">>>> " + encoding + " with zero'd byte array");

                for (i = 0; i < 16; i++)
                {
                    data[i] = 0;
                }

                try
                {
                    result = new String(data, encoding);
                    System.err.println("length of string = " + result.length());
                }
                catch (Exception ex)
                {
                    ex.printStackTrace();
                }

                for (i=0; i < 16; i++)
                {
                    data[i] = (byte)( 32 + i);
                }

                System.err.println(">>>> " + encoding + " with non-zero'd byte array");

                try
                {
                    result = new String(data, encoding);
                    System.err.println("length of string = " + result.length());
                }
                catch (Exception ex)
                {
                    ex.printStackTrace();
                }
            }
        }
        ======================================================================

        Suggested Fix :

        Looking at the EUC_TW convertor code revealed that a valid character
        (the "nil" character) was being used to filter out bad conversions.
        Testcase passes when an invalid character (\FFFF) is used instead.

        Context diff for ByteToCharEUC_TW.java :

        ======================================================================***************
        *** 61,69 ****
                 throws UnknownCharacterException, MalformedInputException,
                        ConversionBufferFullException
              {
                 int inputSize = 0;
        ! char outputChar = (char) 0;
          
                 byteOff = inOff;
                 charOff = outOff;
          
        --- 61,69 ----
                 throws UnknownCharacterException, MalformedInputException,
                        ConversionBufferFullException
              {
                 int inputSize = 0;
        ! char outputChar = '\uFFFF'; //ibm@37723
          
                 byteOff = inOff;
                 charOff = outOff;
          
        ***************
        *** 150,158 ****
                        break;
                     }
                     byteOff++;
          
        ! if (outputChar != (char) 0) {
                        if (outputChar == REPLACE_CHAR) {
                            if (subMode) // substitution enabled
                               outputChar = subChars[0];
                            else {
        --- 150,158 ----
                        break;
                     }
                     byteOff++;
          
        ! if (outputChar != '\uFFFF') { //ibm@37723
                        if (outputChar == REPLACE_CHAR) {
                            if (subMode) // substitution enabled
                               outputChar = subChars[0];
                            else {
        ***************
        *** 160,168 ****
                               throw new UnknownCharacterException();
                            }
                        }
                        output[charOff++] = outputChar;
        ! outputChar = 0;
                     }
                 }
          
                 return charOff - outOff;
        --- 160,168 ----
                               throw new UnknownCharacterException();
                            }
                        }
                        output[charOff++] = outputChar;
        ! outputChar = '\uFFFF'; //ibm@37723
                     }
                 }
          
                 return charOff - outOff;
        ======================================================================

        ======================================================================

              ilittlesunw Ian Little (Inactive)
              mdevereuorcl Michelle Devereux (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: