-
Bug
-
Resolution: Won't Fix
-
P4
-
None
-
1.4.0, 5.0
-
generic
-
generic
The SJIS, EUC_JP and JISX0201 converters in the Java platform deviate
from the published standards for a few special cased codepoints.
The deviation is necessary to overcome and handle issues relating
to yen/overline/tilde characters within the ASCII or JIS roman range
and their logical interpretation within applications.
This needs to be clearly documented within the Java language spec.
The specification requirement was initially recorded in bug 4483377
which records apparent jck failures which occurred when the JCK
io api tests used the published SJIS specification mapping table
from Unicode.org as a means of testing standards conformance for
the Japanese character encodings in the Java platform.
---------carryover info from bug 4483377 --------------------------------------
From 4483377:
Name: ooR10006 Date: 07/24/2001
jdk1.4.0beta-b72's methods:
OutputStreamWriter(ByteArrayOutputStream, encodingName).write(char[]) and
InputStreamReader(ByteArrayInputStream, encodingName).read(char[], int, int)
incorrectly write/read chars/bytes for the encoding "SJIS".
Specifically, the problems are:
1. The character 0x005C is encoded as (byte)0x5C instead of the 2 byte
sequence (byte)0x81, (byte)0x5F.
2. The byte 0x5C is decoded as char 0x005C instead of 0x00A5.
3. 2 byte sequence (byte)0x81, (byte)0x5F is decoded as char 0xFF3C
instead of char 0x005C.
4. The byte 0x7E is decoded as char 0x007E instead of 0x203E, however
the char 0x203E is correctly encoded as byte 0x7E.
The following test shows this:
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
public class test {
public static void main(String[] args){
byteToChar();
charToByte();
}
static void byteToChar(){
char[] receivedChars = new char[4];
byte[] inputBytes = {(byte)0x5c, (byte)0x81, (byte)0x5F, (byte)0x7e};
char[] expectedChars = {(char)0xA5, (char)0x5C, (char)0x203E};
try {
ByteArrayInputStream bais = new ByteArrayInputStream(inputBytes);
InputStreamReader reader = new InputStreamReader(bais, "SJIS");
reader.read(receivedChars, 0, receivedChars.length);
System.out.println("byte sequence for decoding: (byte)0x5C "
+ "(byte)0x81, (byte)0x5F, (byte)0x7e");
System.out.println("decoded chars: "
+ "0x" + Integer.toHexString(receivedChars[0]) + ", "
+ "0x" + Integer.toHexString(receivedChars[1]) + ", "
+ "0x" + Integer.toHexString(receivedChars[2]));
System.out.println("expected chars: "
+ "0x" + Integer.toHexString(expectedChars[0]) + ", "
+ "0x" + Integer.toHexString(expectedChars[1]) + ", "
+ "0x" + Integer.toHexString(expectedChars[2]));
} catch(UnsupportedEncodingException e) {
return;
} catch(IOException ex) {
return;
}
}
static void charToByte(){
char[] inputChars = {(char)0x5C};
byte[] expectedBytes = {(byte)0x81, (byte)0x5F};
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream(10);
OutputStreamWriter writer = new OutputStreamWriter(baos, "SJIS");
writer.write(inputChars);
writer.flush();
byte[] bytes = baos.toByteArray();
System.out.println("chars for encoding: (char)0x5C ");
System.out.println("encoding bytes: "
+ "0x" + Integer.toHexString(bytes[0] & 0xFF));
System.out.println("expected bytes: "
+ "0x" + Integer.toHexString(expectedBytes[0] & 0xFF) + ", "
+ "0x" + Integer.toHexString(expectedBytes[1] & 0xFF));
} catch (UnsupportedEncodingException e) {
return;
} catch (IOException ex) {
return;
}
}
}
% jdk1.4.0beta-b72/solsparc/bin/java test
byte sequence for decoding: (byte)0x5C (byte)0x81, (byte)0x5F, (byte)0x7e
decoded chars: 0x5c, 0xff3c, 0x7e
expected chars: 0xa5, 0x5c, 0x203e
chars for encoding: (char)0x5C
encoding bytes: 0x5c
expected bytes: 0x81, 0x5f
%
Due to this the following JCK test fails:
api/java_io/mbCharEncoding/index.html#ShiftJIS
The test is correct and actually fails since 1.1.6
======================================================================
from the published standards for a few special cased codepoints.
The deviation is necessary to overcome and handle issues relating
to yen/overline/tilde characters within the ASCII or JIS roman range
and their logical interpretation within applications.
This needs to be clearly documented within the Java language spec.
The specification requirement was initially recorded in bug 4483377
which records apparent jck failures which occurred when the JCK
io api tests used the published SJIS specification mapping table
from Unicode.org as a means of testing standards conformance for
the Japanese character encodings in the Java platform.
---------carryover info from bug 4483377 --------------------------------------
From 4483377:
Name: ooR10006 Date: 07/24/2001
jdk1.4.0beta-b72's methods:
OutputStreamWriter(ByteArrayOutputStream, encodingName).write(char[]) and
InputStreamReader(ByteArrayInputStream, encodingName).read(char[], int, int)
incorrectly write/read chars/bytes for the encoding "SJIS".
Specifically, the problems are:
1. The character 0x005C is encoded as (byte)0x5C instead of the 2 byte
sequence (byte)0x81, (byte)0x5F.
2. The byte 0x5C is decoded as char 0x005C instead of 0x00A5.
3. 2 byte sequence (byte)0x81, (byte)0x5F is decoded as char 0xFF3C
instead of char 0x005C.
4. The byte 0x7E is decoded as char 0x007E instead of 0x203E, however
the char 0x203E is correctly encoded as byte 0x7E.
The following test shows this:
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
public class test {
public static void main(String[] args){
byteToChar();
charToByte();
}
static void byteToChar(){
char[] receivedChars = new char[4];
byte[] inputBytes = {(byte)0x5c, (byte)0x81, (byte)0x5F, (byte)0x7e};
char[] expectedChars = {(char)0xA5, (char)0x5C, (char)0x203E};
try {
ByteArrayInputStream bais = new ByteArrayInputStream(inputBytes);
InputStreamReader reader = new InputStreamReader(bais, "SJIS");
reader.read(receivedChars, 0, receivedChars.length);
System.out.println("byte sequence for decoding: (byte)0x5C "
+ "(byte)0x81, (byte)0x5F, (byte)0x7e");
System.out.println("decoded chars: "
+ "0x" + Integer.toHexString(receivedChars[0]) + ", "
+ "0x" + Integer.toHexString(receivedChars[1]) + ", "
+ "0x" + Integer.toHexString(receivedChars[2]));
System.out.println("expected chars: "
+ "0x" + Integer.toHexString(expectedChars[0]) + ", "
+ "0x" + Integer.toHexString(expectedChars[1]) + ", "
+ "0x" + Integer.toHexString(expectedChars[2]));
} catch(UnsupportedEncodingException e) {
return;
} catch(IOException ex) {
return;
}
}
static void charToByte(){
char[] inputChars = {(char)0x5C};
byte[] expectedBytes = {(byte)0x81, (byte)0x5F};
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream(10);
OutputStreamWriter writer = new OutputStreamWriter(baos, "SJIS");
writer.write(inputChars);
writer.flush();
byte[] bytes = baos.toByteArray();
System.out.println("chars for encoding: (char)0x5C ");
System.out.println("encoding bytes: "
+ "0x" + Integer.toHexString(bytes[0] & 0xFF));
System.out.println("expected bytes: "
+ "0x" + Integer.toHexString(expectedBytes[0] & 0xFF) + ", "
+ "0x" + Integer.toHexString(expectedBytes[1] & 0xFF));
} catch (UnsupportedEncodingException e) {
return;
} catch (IOException ex) {
return;
}
}
}
% jdk1.4.0beta-b72/solsparc/bin/java test
byte sequence for decoding: (byte)0x5C (byte)0x81, (byte)0x5F, (byte)0x7e
decoded chars: 0x5c, 0xff3c, 0x7e
expected chars: 0xa5, 0x5c, 0x203e
chars for encoding: (char)0x5C
encoding bytes: 0x5c
expected bytes: 0x81, 0x5f
%
Due to this the following JCK test fails:
api/java_io/mbCharEncoding/index.html#ShiftJIS
The test is correct and actually fails since 1.1.6
======================================================================
- relates to
-
JDK-4964355 Clarify (lack of) specification for optional charsets
- Resolved
-
JDK-4483377 wrong CharToByte and ByteToChar conversions for SJIS encoding
- Closed
-
JDK-4838072 Yen sign is not converted properly when using String.getBytes("Shift_JIS")
- Closed
-
JDK-4251698 RFE: JIS X0208/Unicode mappings should follow the JIS standard (JIS X0208:1997)
- Resolved
-
JDK-4556882 Follow the IANA definition for "shift_jis" charset name
- Closed