-
Type:
CSR
-
Resolution: Approved
-
Priority:
P3
-
Component/s: core-libs
-
None
-
behavioral
-
low
-
-
Java API
-
Implementation
Summary
MS950 charset encoder behaves differently as defined in the Traditional Chinese Windows specification
Problem
Windows code page 950 has some n:1 byte-to-char mappings for certain code points. In JDK's MS950 charset, there are 4 char-to-byte mappings differ from Traditional Chinese Windows.<br> (Actual issue was in https://bugs.openjdk.java.net/browse/JDK-8232161)
Solution
I recommend that following 4 char-to-byte mappings need to change.<br> <br> Before:
<pre> \u2550 -> \xA2\xA4 \u255E -> \xA2\xA5 \u2561 -> \xA2\xA7 \u256A -> \xA2\xA7 </pre>
After:
<pre> \u2550 -> \xF9\xF9 \u255E -> \xF9\xE9 \u2561 -> \xF9\xEB \u256A -> \xF9\xEA </pre>
<br> Definition:<br> Traditional Chinese Windows conversion table is:<br> https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT<br> Newer MS950 definition is:<br> https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt<br> <br> \u2550, \u255E, \u2561 and \u256A are in BOX DRAWINGS Unicode range.<br> (See attached 4Chras.png for font glyphs)<br>
Specification
N/A
- csr of
-
JDK-8259790 Align some one-way conversion in MS950 charset with Windows
-
- Resolved
-
- relates to
-
JDK-8248305 Align some one-way conversion in MS950 charset with Windows
-
- Closed
-