-
Bug
-
Resolution: Not an Issue
-
P3
-
None
-
1.4.1
-
x86
-
windows_xp
Name: nt126004 Date: 03/26/2003
FULL PRODUCT VERSION :
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)
FULL OS VERSION :
Microsoft Windows XP [Version 5.1.2600]
A DESCRIPTION OF THE PROBLEM :
The methods String.getBytes(String charsetName) and new String(byte[] bytes, String charsetName) should be complimentary (unless characters in the string are not defined in the specified charset). For any String, you should be able to create a byte array with getBytes and then create a new String from the byte array such that the new String is equivalent to original String.
When the charsetName is "Shift_JIS" and the String contains a Yen character, the method String.getBytes returns a value of 0x5C. This is correct behavior. However, the method new String(bytes, charsetName) converts the byte back to a String containing the Reverse Solidus character instead of the Yen character. This is not correct behavior.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See the example program RoundTrip.java
EXPECTED VERSUS ACTUAL BEHAVIOR :
When the charset is Shift_JIS, a byte with value 0x5C should be converted to a character with unicode value 0xA5.
The byte value 0x5C is converted to Unicode value 0x5c.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
public class RoundTrip{
public static void main (String args[]){
String csName = "Shift_JIS";
String testString = "\u3072\u00a5"; //HIRAGANA LETTER HI, YEN SIGN
roundTrip(csName, testString);
}
//do a round-trip conversion from String to byte[] and back to String
private static void roundTrip(String csName, String testString){
try{
/* depending on your configuration, the unicode
characters may not
display correctly. This is not relevant to the issue
at hand
though. */
// display the arguments passed in
System.out.println("encode and decode '" + testString + "' using " + csName);
System.out.print("Unicode values: ");
int len = testString.length();
//display the numeric value of each character before encoding
for (int n = 0; n < len; n++){
int val = testString.charAt(n);
System.out.print(val);
System.out.print(" ");
}
System.out.println();
System.out.println();
//encode to bytes using the specified charsetName
byte[] b = testString.getBytes(csName);
System.out.print("Encoded bytes: ");
//display the encoded values
for (int n = 0; n < b.length; n++){
System.out.print(b[n]);
System.out.print(" ");
}
System.out.println();
System.out.println();
//convert the bytes back to a String
String decode = new String(b, csName);
System.out.println("Decoded String " + decode);
System.out.print("Unicode values: ");
len = decode.length();
//display the numeric value of each character again
for (int n = 0; n < len; n++){
int val = decode.charAt(n);
System.out.print(val);
System.out.print(" ");
}
System.out.println();
}
catch (Throwable t){
System.out.println(t);
}
}
}
---------- END SOURCE ----------
(Review ID: 183058)
======================================================================
FULL PRODUCT VERSION :
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)
FULL OS VERSION :
Microsoft Windows XP [Version 5.1.2600]
A DESCRIPTION OF THE PROBLEM :
The methods String.getBytes(String charsetName) and new String(byte[] bytes, String charsetName) should be complimentary (unless characters in the string are not defined in the specified charset). For any String, you should be able to create a byte array with getBytes and then create a new String from the byte array such that the new String is equivalent to original String.
When the charsetName is "Shift_JIS" and the String contains a Yen character, the method String.getBytes returns a value of 0x5C. This is correct behavior. However, the method new String(bytes, charsetName) converts the byte back to a String containing the Reverse Solidus character instead of the Yen character. This is not correct behavior.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See the example program RoundTrip.java
EXPECTED VERSUS ACTUAL BEHAVIOR :
When the charset is Shift_JIS, a byte with value 0x5C should be converted to a character with unicode value 0xA5.
The byte value 0x5C is converted to Unicode value 0x5c.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
public class RoundTrip{
public static void main (String args[]){
String csName = "Shift_JIS";
String testString = "\u3072\u00a5"; //HIRAGANA LETTER HI, YEN SIGN
roundTrip(csName, testString);
}
//do a round-trip conversion from String to byte[] and back to String
private static void roundTrip(String csName, String testString){
try{
/* depending on your configuration, the unicode
characters may not
display correctly. This is not relevant to the issue
at hand
though. */
// display the arguments passed in
System.out.println("encode and decode '" + testString + "' using " + csName);
System.out.print("Unicode values: ");
int len = testString.length();
//display the numeric value of each character before encoding
for (int n = 0; n < len; n++){
int val = testString.charAt(n);
System.out.print(val);
System.out.print(" ");
}
System.out.println();
System.out.println();
//encode to bytes using the specified charsetName
byte[] b = testString.getBytes(csName);
System.out.print("Encoded bytes: ");
//display the encoded values
for (int n = 0; n < b.length; n++){
System.out.print(b[n]);
System.out.print(" ");
}
System.out.println();
System.out.println();
//convert the bytes back to a String
String decode = new String(b, csName);
System.out.println("Decoded String " + decode);
System.out.print("Unicode values: ");
len = decode.length();
//display the numeric value of each character again
for (int n = 0; n < len; n++){
int val = decode.charAt(n);
System.out.print(val);
System.out.print(" ");
}
System.out.println();
}
catch (Throwable t){
System.out.println(t);
}
}
}
---------- END SOURCE ----------
(Review ID: 183058)
======================================================================
- relates to
-
JDK-4486307 (spec) Need to document deviation from standards in Japanese charsets
- Closed