-
Bug
-
Resolution: Unresolved
-
P4
-
8, 26
-
generic
-
generic
A DESCRIPTION OF THE PROBLEM :
IBM930 uses the wrong character when decoding the hex sequence 0x4260.
The correct character would be U+2212 (full-width hyphen).
The character being used currently is U+FF0D (Minus sign)
This is especially important when trying to convert from IBM930 to windows-31j, which has a full-width hyphen (0x817C), but no minus sign.
In particular the equivalence between IBM930's 0x4260 and Windows-31j's 0x817C is established in this document from IBM: https://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP00300.pdf, page 411. I believe the corrections in 12.1.2 might not have been incorporated into this character set.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Decode the byte sequence 0x4260 into a string using the x-IBM930 charset, and then encode it to bytes using Windows-31j charset.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The expected output bytes would be 0x817C
ACTUAL -
The actual result is that the character is not in the windows-31j charset, and you will get a replacement character, error, or nothing according to the charset configuration.
IBM930 uses the wrong character when decoding the hex sequence 0x4260.
The correct character would be U+2212 (full-width hyphen).
The character being used currently is U+FF0D (Minus sign)
This is especially important when trying to convert from IBM930 to windows-31j, which has a full-width hyphen (0x817C), but no minus sign.
In particular the equivalence between IBM930's 0x4260 and Windows-31j's 0x817C is established in this document from IBM: https://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP00300.pdf, page 411. I believe the corrections in 12.1.2 might not have been incorporated into this character set.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Decode the byte sequence 0x4260 into a string using the x-IBM930 charset, and then encode it to bytes using Windows-31j charset.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The expected output bytes would be 0x817C
ACTUAL -
The actual result is that the character is not in the windows-31j charset, and you will get a replacement character, error, or nothing according to the charset configuration.
- caused by
-
JDK-6843578 Re-implement IBM doublebyte charsets
-
- Resolved
-
- links to
-
Review(master) openjdk/jdk/27594