Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6431650

ISCII91 Charset encoder doesn't agree with CharToByteISCII91; bytes don't decode

        FULL PRODUCT VERSION :
        Using Java SDK 1.4.2

        ADDITIONAL OS VERSION INFORMATION :
        Windows 2000 Version 5.00.2195

        A DESCRIPTION OF THE PROBLEM :
        The ISCII charset implementation seems to incorrectly encode characters. Characters in the \u0900 page (legal for ISCII91) get encoded with a trailing 0xff. Also, some characters seem to get translated into a sequence the decoder does not recognize, so a sequence encoded with the Charset will not decode.

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        See test case below for example. Pretty much, get a sequence of non-ASCII ISCII91 legal characters and encode them. The result looks suspicious. Decoding them fails to reproduce the original string.


        /*
         * Created on Feb 9, 2004
         */
        package com.pombi.lib.cvsclient.util;

        import java.nio.CharBuffer;
        import java.nio.charset.CharacterCodingException;
        import java.nio.charset.Charset;
        import java.nio.charset.CharsetEncoder;


        import junit.framework.TestCase;
        import sun.io.CharToByteConverter;
        import sun.io.CharToByteISCII91;
        import sun.io.MalformedInputException;

        /**
         * @author cebert
         */
        public class ISCII91Test extends TestCase {

                public ISCII91Test(String name) {
                        super(name);
                }

                public final void testConvert() throws CharacterCodingException, MalformedInputException {
                        CharToByteConverter toCharsOld = new CharToByteISCII91();
                        Charset iscii91Charset = Charset.forName("ISCII91");
                        String charsToEncode = getCharsForEncoding("ISCII91");
                        byte [] oldBytes = toCharsOld.convertAll(charsToEncode.toCharArray());
                        byte [] newBytes = iscii91Charset.encode(charsToEncode).array();
                        for (int i = 0; i < oldBytes.length; ++i) {
                                assertEquals("At " + i, oldBytes[i], newBytes[i]);
                        }
                }


                static final String getCharsForEncoding(String encodingName) throws CharacterCodingException{
                        final Charset set = Charset.forName(encodingName);
                        final CharBuffer chars = CharBuffer.allocate(300);
                        final CharsetEncoder encoder = set.newEncoder();
                        for (int c = 0; chars.remaining() > 0 && c < Character.MAX_VALUE; ++c) {
                                if (Character.isDefined((char) c) && !Character.isISOControl((char) c) && encoder.canEncode((char) c)) {
                                        chars.put((char) c);
                                }
                        }
                        chars.limit(chars.position());
                        chars.rewind();
                        return chars.toString();
                }
        }



        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        CharToByteISCII91 and the ISCII91 Charset should agree. A string encoded to bytes and then decoded to chars by the ISCII91 Charset should be unchanged.


        REPRODUCIBILITY :
        This bug can be reproduced always.

        CUSTOMER SUBMITTED WORKAROUND :
        Use old encoding (but have to work around new charset code)

              sherman Xueming Shen
              ndcosta Nelson Dcosta (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: