-
Bug
-
Resolution: Fixed
-
P3
-
1.4.2
-
b89
-
x86
-
windows_2000
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2158461 | 7 | Unassigned | P4 | Closed | Cannot Reproduce | |
JDK-2158187 | 5.0-pool | Xueming Shen | P4 | Closed | Won't Fix | |
JDK-2160318 | 1.4.2_18 | Unassigned | P4 | Closed | Won't Fix |
FULL PRODUCT VERSION :
Using Java SDK 1.4.2
ADDITIONAL OS VERSION INFORMATION :
Windows 2000 Version 5.00.2195
A DESCRIPTION OF THE PROBLEM :
The ISCII charset implementation seems to incorrectly encode characters. Characters in the \u0900 page (legal for ISCII91) get encoded with a trailing 0xff. Also, some characters seem to get translated into a sequence the decoder does not recognize, so a sequence encoded with the Charset will not decode.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See test case below for example. Pretty much, get a sequence of non-ASCII ISCII91 legal characters and encode them. The result looks suspicious. Decoding them fails to reproduce the original string.
/*
* Created on Feb 9, 2004
*/
package com.pombi.lib.cvsclient.util;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import junit.framework.TestCase;
import sun.io.CharToByteConverter;
import sun.io.CharToByteISCII91;
import sun.io.MalformedInputException;
/**
* @author cebert
*/
public class ISCII91Test extends TestCase {
public ISCII91Test(String name) {
super(name);
}
public final void testConvert() throws CharacterCodingException, MalformedInputException {
CharToByteConverter toCharsOld = new CharToByteISCII91();
Charset iscii91Charset = Charset.forName("ISCII91");
String charsToEncode = getCharsForEncoding("ISCII91");
byte [] oldBytes = toCharsOld.convertAll(charsToEncode.toCharArray());
byte [] newBytes = iscii91Charset.encode(charsToEncode).array();
for (int i = 0; i < oldBytes.length; ++i) {
assertEquals("At " + i, oldBytes[i], newBytes[i]);
}
}
static final String getCharsForEncoding(String encodingName) throws CharacterCodingException{
final Charset set = Charset.forName(encodingName);
final CharBuffer chars = CharBuffer.allocate(300);
final CharsetEncoder encoder = set.newEncoder();
for (int c = 0; chars.remaining() > 0 && c < Character.MAX_VALUE; ++c) {
if (Character.isDefined((char) c) && !Character.isISOControl((char) c) && encoder.canEncode((char) c)) {
chars.put((char) c);
}
}
chars.limit(chars.position());
chars.rewind();
return chars.toString();
}
}
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
CharToByteISCII91 and the ISCII91 Charset should agree. A string encoded to bytes and then decoded to chars by the ISCII91 Charset should be unchanged.
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
Use old encoding (but have to work around new charset code)
Using Java SDK 1.4.2
ADDITIONAL OS VERSION INFORMATION :
Windows 2000 Version 5.00.2195
A DESCRIPTION OF THE PROBLEM :
The ISCII charset implementation seems to incorrectly encode characters. Characters in the \u0900 page (legal for ISCII91) get encoded with a trailing 0xff. Also, some characters seem to get translated into a sequence the decoder does not recognize, so a sequence encoded with the Charset will not decode.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See test case below for example. Pretty much, get a sequence of non-ASCII ISCII91 legal characters and encode them. The result looks suspicious. Decoding them fails to reproduce the original string.
/*
* Created on Feb 9, 2004
*/
package com.pombi.lib.cvsclient.util;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import junit.framework.TestCase;
import sun.io.CharToByteConverter;
import sun.io.CharToByteISCII91;
import sun.io.MalformedInputException;
/**
* @author cebert
*/
public class ISCII91Test extends TestCase {
public ISCII91Test(String name) {
super(name);
}
public final void testConvert() throws CharacterCodingException, MalformedInputException {
CharToByteConverter toCharsOld = new CharToByteISCII91();
Charset iscii91Charset = Charset.forName("ISCII91");
String charsToEncode = getCharsForEncoding("ISCII91");
byte [] oldBytes = toCharsOld.convertAll(charsToEncode.toCharArray());
byte [] newBytes = iscii91Charset.encode(charsToEncode).array();
for (int i = 0; i < oldBytes.length; ++i) {
assertEquals("At " + i, oldBytes[i], newBytes[i]);
}
}
static final String getCharsForEncoding(String encodingName) throws CharacterCodingException{
final Charset set = Charset.forName(encodingName);
final CharBuffer chars = CharBuffer.allocate(300);
final CharsetEncoder encoder = set.newEncoder();
for (int c = 0; chars.remaining() > 0 && c < Character.MAX_VALUE; ++c) {
if (Character.isDefined((char) c) && !Character.isISOControl((char) c) && encoder.canEncode((char) c)) {
chars.put((char) c);
}
}
chars.limit(chars.position());
chars.rewind();
return chars.toString();
}
}
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
CharToByteISCII91 and the ISCII91 Charset should agree. A string encoded to bytes and then decoded to chars by the ISCII91 Charset should be unchanged.
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
Use old encoding (but have to work around new charset code)
- backported by
-
JDK-2158187 ISCII91 Charset encoder doesn't agree with CharToByteISCII91; bytes don't decode
-
- Closed
-
-
JDK-2158461 ISCII91 Charset encoder doesn't agree with CharToByteISCII91; bytes don't decode
-
- Closed
-
-
JDK-2160318 ISCII91 Charset encoder doesn't agree with CharToByteISCII91; bytes don't decode
-
- Closed
-