-
CSR
-
Resolution: Approved
-
P3
-
None
-
behavioral
-
low
-
-
Java API
-
Implementation
This one is same as JDK-8233385.
https://bugs.openjdk.java.net/browse/JDK-8233385
Summary
MS950 charset encoder behaves differently as defined in the Traditional Chinese Windows specification
Problem
Windows code page 950 has some n:1 byte-to-char mappings for certain code points. In JDK's MS950 charset, there are 4 char-to-byte mappings differ from Traditional Chinese Windows.
(Actual issue was in https://bugs.openjdk.java.net/browse/JDK-8232161)
Solution
I recommend that following 4 char-to-byte mappings need to change.
Before:
\u2550 -> \xA2\xA4 \u255E -> \xA2\xA5 \u2561 -> \xA2\xA7 \u256A -> \xA2\xA7
After:
\u2550 -> \xF9\xF9 \u255E -> \xF9\xE9 \u2561 -> \xF9\xEB \u256A -> \xF9\xEA
Definition:
Traditional Chinese Windows conversion table is:
https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
Newer MS950 definition is:
https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt
\u2550, \u255E, \u2561 and \u256A are in BOX DRAWINGS Unicode range.
(See attached 4Chras.png for font glyphs)
Specification
N/A
Q&A Comments
Joe Darcy added a comment - 2020-02-13 18:07
Ichiroh Takiguchi, so the difference before and after is which set of box characters get mapped to?
Is there any JDK or Java SE specification that needs to be updated?
Joe Darcy added a comment - 2020-02-14 09:01
Marking the request as pended until the questions above are answered.
Ichiroh Takiguchi added a comment - 2020-02-16 05:54
Sorry, I'm late.
the difference before and after is which set of box characters get mapped to?
Unicode side codes are not changed. It means font glyphs are not changed.
For example, MS950 - Uncode mapping are not changed
\xA2\xA4 -> \u2550
\xF9\xF9 -> \u2550
Before change:
\u2550 -> \xA2\xA4
After change:
\u2550 -> \xF9\xF9
Is there any JDK or Java SE specification that needs to be updated?
No. This CSR does not affect Java SE specification. It just follows the latest Microsoft's CP950 specification.
Joe Darcy added a comment - 2020-02-18 10:48
Ichiroh Takiguchi, please explain what exactly this CSR proposes to alter, the value with of which methods would differ, etc.
Joe Darcy added a comment - 2020-02-19 12:51
This request will stay pended until the requested information is provided.
Ichiroh Takiguchi added a comment - 2020-02-20 02:40
Expected result:
Java's working behavior should be the same as Windows' one.
Exact change:
I'd like to change one-way trip conversion definitions by changing make/data/charsetmapping/MS950.nr. No logic change is included.
Working behavior:
Customer's case is as follows:
- He uses Traditional Chinese Windows with Version Control System (VCS).
Windows implementation:
- He opens the file which has \xF9\xF9 via Windows application, like Notepad.
- He saved the file without any change.
- \xF9\xF9 is stored as \xF9\xF9
==> VCS does not detect the change.
- He opens the file which has \xA2\xA4 via Windows application
- He saved the file without any change.
- \xA2\xA4 is stored as \xF9\xF9
==> VCS can detect the change.
Current implementation:
- He opened the file which has \xF9\xF9 via Java application without any change
- He saved the file without any change
- \xF9\xF9 is stored as \xA2\xA4, then VCS can detect the changes
==> VCS can detect the change.
- He opens the file which has \xA2\xA4 via Windows application
- He saved the file without any change.
- \xA2\xA4 is stored as \xA2\xA4
==> VCS does not detect the change.
New implementation:
- He opened the file which has \xF9\xF9 via Java application without any change
- He saved the file without any change
- \xF9\xF9 is stored as \xF9\xF9, then VCS can detect the changes.
==> VCS does not detect the change.
- He opened the file which has \xA2\xA4 via Java application without any change
- He saved the file without any change
- \xA2\xA4 is stored as \xF9\xF9, then VCS can detect the changes.
==> VCS can detect the change.
If the change is applied, Java's working behavior is the same as Windows' one.
Joe Darcy added a comment - 2020-02-25 10:51
After some additional information from Naoto Sato, moving to Approved.
Please consider a release not for this change.
- csr of
-
JDK-8245689 Align some one-way conversion in MS950 charset with Windows
-
- Resolved
-
-
JDK-8259790 Align some one-way conversion in MS950 charset with Windows
-
- Resolved
-
- relates to
-
JDK-8259791 Align some one-way conversion in MS950 charset with Windows
-
- Closed
-