-
Bug
-
Resolution: Fixed
-
P4
-
1.3.1_09
-
12
-
sparc
-
solaris_8
-
Verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2076635 | 5.0 | Ian Little | P4 | Resolved | Fixed | b38 |
JDK-2076634 | 1.4.2_05 | Ian Little | P4 | Closed | Fixed | 05 |
Name: dk106046 Date: 10/31/2003
Operating System(s) :
Sun Solaris 2.8
Full JDK version(s) (from java -version) :
java version "1.3.1_09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_09-b03)
Java HotSpot(TM) Client VM (build 1.3.1_09-b03, mixed mode)
Detailed description of the problem:
EBCDIC lines of text are being converted to Kanji but it is noticed that some characters do not convert correctly, for example the hyphen character. This problem is not noticed when using Java 1.2.2_17.
- Exact steps to reproduce:
1 Detach the java files and FTP as Binary to Solaris
2 compile with appropriate JDK
3 Run as java CallConverter > jdk131-09.html
4 FTP the jdk131-09.html back to windows as binary
5 Open the jdk131-09.html file in IE5.50 or above should be okay.
6 Goto View->Encoding and select Japanese (Shift-JIS) to view the
correct charercter set.
There would be a circle in the output that is the unwanted character. This is circled in red in the word doc (picjdk131-04.doc available on request). The expected output is seen in the html doc (outputFromJDK1.1.8.html available on request).
- Source code that demonstrates the problem:
=============== CallConverter.java ==========================================
public class CallConverter {
public static void main(String args []){
//This is what came back from the mainframe
String input = "0E43CE438A43A8404044C445BC45B6459A45864040426045804567455240404586458545530F";
//Hexify the input
String line = CharacterConverter.getInstance().hexifyString(input);
//Convert from CodePage930 to CodePage943
if (line != null && line.length() != 0) {
System.out.println(CharacterConverter.getInstance().charCodeConvert(line,"Cp930","Cp943"));
}
}
}
=============== CharacterConverter.java =======================================
import java.io.UnsupportedEncodingException;
public class CharacterConverter {
private static String defaultCode = "ISO8859-1";
private static CharacterConverter instance = new CharacterConverter();
private CharacterConverter()
{
super();
}
public static CharacterConverter getInstance()
{
return instance;
}
public String hexifyString(String stringToHexify)
{
String errMsg = null;
String tempHex = "";
// Parse input string to strip out unnecessary 00's and FF's
boolean shiftout = true;
int hexIdx = 0;
int len = stringToHexify.length();
if ((len % 2) != 0)
{
System.out.println("len%2 s");
return null;
}
while (hexIdx < len)
{
// Delete 00's and FF's
if ((stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == '0') || (stringToHexify.charAt(hexIdx) == 'F' && stringToHexify.charAt(hexIdx + 1) == 'F'))
{
hexIdx += 2;
}
else if (!(stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1)
== 'E'))
{
// We have a vaid single-byte pair of characters
tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
hexIdx += 2;
}
else
{
// we've found a shift-in
// copy the "OE"
tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
hexIdx += 2;
// look for 00 and FF every fourth position until we find shift-out
shiftout = false;
while (!shiftout && hexIdx < len)
{
if (stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == 'F')
{
shiftout = true;
tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
hexIdx += 2;
}
else if ((stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == '0') || (stringToHexify.charAt(hexIdx) == 'F' && stringToHexify.charAt(hexIdx + 1
) == 'F'))
{
// don't copy any four byte sequence beginning with 00's orFF's
hexIdx += 4;
}
else
{
tempHex += stringToHexify.substring(hexIdx, hexIdx + 4);
hexIdx += 4;
}
}
}
}
String hexedString = tempHex;
if (hexedString != null && !hexedString.equals(""))
{
// hexify the string.
len = hexedString.length();
if ((len % 2) != 0)
{
System.out.println("len%2");
return null;
}
char hexStr[] = new char[len / 2];
for (int i = 0; i < len; i += 2)
{
hexStr[i / 2] = (char) Integer.parseInt(hexedString.substring(i, i + 2), 16);
}
hexedString = new String(hexStr);
}
return hexedString;
}
public String charCodeConvert(String hexedString, String defaultEnCode, String fromCode, String toCode)
{
String convertedString = null;
try
{
String conString = new String(hexedString.getBytes(defaultEnCode), fromCode);
convertedString = new String(conString.getBytes(toCode));
}
catch (UnsupportedEncodingException ue)
{
// throw new JavaException("CharacterConverter", ExceptionTypesEnum.ERROR, ue.toString());
//ue.printStackTrace();
System.out.println("UnsupportedEncodingException");
}
return convertedString;
}
public String charCodeConvert(String hexedString, String fromCode, String toCode)
{
return charCodeConvert(hexedString, defaultCode, fromCode, toCode );
}
}
We suggested the following fix,:
in ext\i18n\src\share\sun\io\ByteToCharCp930.java file
change line 231 : from :
"\uFF01\uFFE5\uFF0A\uFF09\uFF1B\uFFE2\uFF0D\uFF0F\uFFFD\uFFFD" + // 400 - 409
to :
"\uFF01\uFFE5\uFF0A\uFF09\uFF1B\uFFE2\u2212\uFF0F\uFFFD\uFFFD" + // 400 - 409
The FFOD character was found to be a problem from the following website. : http://oss.software.ibm.com/pipermail/icu4c-support/2002-October/000757.html
======================================================================
Operating System(s) :
Sun Solaris 2.8
Full JDK version(s) (from java -version) :
java version "1.3.1_09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_09-b03)
Java HotSpot(TM) Client VM (build 1.3.1_09-b03, mixed mode)
Detailed description of the problem:
EBCDIC lines of text are being converted to Kanji but it is noticed that some characters do not convert correctly, for example the hyphen character. This problem is not noticed when using Java 1.2.2_17.
- Exact steps to reproduce:
1 Detach the java files and FTP as Binary to Solaris
2 compile with appropriate JDK
3 Run as java CallConverter > jdk131-09.html
4 FTP the jdk131-09.html back to windows as binary
5 Open the jdk131-09.html file in IE5.50 or above should be okay.
6 Goto View->Encoding and select Japanese (Shift-JIS) to view the
correct charercter set.
There would be a circle in the output that is the unwanted character. This is circled in red in the word doc (picjdk131-04.doc available on request). The expected output is seen in the html doc (outputFromJDK1.1.8.html available on request).
- Source code that demonstrates the problem:
=============== CallConverter.java ==========================================
public class CallConverter {
public static void main(String args []){
//This is what came back from the mainframe
String input = "0E43CE438A43A8404044C445BC45B6459A45864040426045804567455240404586458545530F";
//Hexify the input
String line = CharacterConverter.getInstance().hexifyString(input);
//Convert from CodePage930 to CodePage943
if (line != null && line.length() != 0) {
System.out.println(CharacterConverter.getInstance().charCodeConvert(line,"Cp930","Cp943"));
}
}
}
=============== CharacterConverter.java =======================================
import java.io.UnsupportedEncodingException;
public class CharacterConverter {
private static String defaultCode = "ISO8859-1";
private static CharacterConverter instance = new CharacterConverter();
private CharacterConverter()
{
super();
}
public static CharacterConverter getInstance()
{
return instance;
}
public String hexifyString(String stringToHexify)
{
String errMsg = null;
String tempHex = "";
// Parse input string to strip out unnecessary 00's and FF's
boolean shiftout = true;
int hexIdx = 0;
int len = stringToHexify.length();
if ((len % 2) != 0)
{
System.out.println("len%2 s");
return null;
}
while (hexIdx < len)
{
// Delete 00's and FF's
if ((stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == '0') || (stringToHexify.charAt(hexIdx) == 'F' && stringToHexify.charAt(hexIdx + 1) == 'F'))
{
hexIdx += 2;
}
else if (!(stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1)
== 'E'))
{
// We have a vaid single-byte pair of characters
tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
hexIdx += 2;
}
else
{
// we've found a shift-in
// copy the "OE"
tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
hexIdx += 2;
// look for 00 and FF every fourth position until we find shift-out
shiftout = false;
while (!shiftout && hexIdx < len)
{
if (stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == 'F')
{
shiftout = true;
tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
hexIdx += 2;
}
else if ((stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == '0') || (stringToHexify.charAt(hexIdx) == 'F' && stringToHexify.charAt(hexIdx + 1
) == 'F'))
{
// don't copy any four byte sequence beginning with 00's orFF's
hexIdx += 4;
}
else
{
tempHex += stringToHexify.substring(hexIdx, hexIdx + 4);
hexIdx += 4;
}
}
}
}
String hexedString = tempHex;
if (hexedString != null && !hexedString.equals(""))
{
// hexify the string.
len = hexedString.length();
if ((len % 2) != 0)
{
System.out.println("len%2");
return null;
}
char hexStr[] = new char[len / 2];
for (int i = 0; i < len; i += 2)
{
hexStr[i / 2] = (char) Integer.parseInt(hexedString.substring(i, i + 2), 16);
}
hexedString = new String(hexStr);
}
return hexedString;
}
public String charCodeConvert(String hexedString, String defaultEnCode, String fromCode, String toCode)
{
String convertedString = null;
try
{
String conString = new String(hexedString.getBytes(defaultEnCode), fromCode);
convertedString = new String(conString.getBytes(toCode));
}
catch (UnsupportedEncodingException ue)
{
// throw new JavaException("CharacterConverter", ExceptionTypesEnum.ERROR, ue.toString());
//ue.printStackTrace();
System.out.println("UnsupportedEncodingException");
}
return convertedString;
}
public String charCodeConvert(String hexedString, String fromCode, String toCode)
{
return charCodeConvert(hexedString, defaultCode, fromCode, toCode );
}
}
We suggested the following fix,:
in ext\i18n\src\share\sun\io\ByteToCharCp930.java file
change line 231 : from :
"\uFF01\uFFE5\uFF0A\uFF09\uFF1B\uFFE2\uFF0D\uFF0F\uFFFD\uFFFD" + // 400 - 409
to :
"\uFF01\uFFE5\uFF0A\uFF09\uFF1B\uFFE2\u2212\uFF0F\uFFFD\uFFFD" + // 400 - 409
The FFOD character was found to be a problem from the following website. : http://oss.software.ibm.com/pipermail/icu4c-support/2002-October/000757.html
======================================================================
- backported by
-
JDK-2076635 Japanese characters not converting correctly from Codepage 930 to Codepage 943
- Resolved
-
JDK-2076634 Japanese characters not converting correctly from Codepage 930 to Codepage 943
- Closed