Loading...

XML

Word

Printable

Type: CSR
Resolution: Approved
Priority: P3
Fix Version/s: 11-pool
Component/s: core-libs
Labels:
None

Subcomponent:
java.nio.charsets
Compatibility Kind:

behavioral
Compatibility Risk:
low
Compatibility Risk Description:

Hide
Applications that expect the existing mapping for those one-way conversion code points will not work. Since these are uncommon box character code points, the risk is low. It should be best avoided to introduce a property to switch back to the old behavior.

Show
Applications that expect the existing mapping for those one-way conversion code points will not work. Since these are uncommon box character code points, the risk is low. It should be best avoided to introduce a property to switch back to the old behavior.
Interface Kind:

Java API
Scope:
Implementation

This one is same as JDK-8233385.
https://bugs.openjdk.java.net/browse/JDK-8233385

Summary

MS950 charset encoder behaves differently as defined in the Traditional Chinese Windows specification

Problem

Windows code page 950 has some n:1 byte-to-char mappings for certain code points. In JDK's MS950 charset, there are 4 char-to-byte mappings differ from Traditional Chinese Windows.
(Actual issue was in https://bugs.openjdk.java.net/browse/JDK-8232161)

Solution

I recommend that following 4 char-to-byte mappings need to change.

Before:

\u2550 -> \xA2\xA4
\u255E -> \xA2\xA5
\u2561 -> \xA2\xA7
\u256A -> \xA2\xA7

After:

\u2550 -> \xF9\xF9
\u255E -> \xF9\xE9
\u2561 -> \xF9\xEB
\u256A -> \xF9\xEA

Definition:
Traditional Chinese Windows conversion table is:
https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
Newer MS950 definition is:
https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt

\u2550, \u255E, \u2561 and \u256A are in BOX DRAWINGS Unicode range.
(See attached 4Chras.png for font glyphs)

Specification

N/A

Q&A Comments

Joe Darcy added a comment - 2020-02-13 18:07

Ichiroh Takiguchi, so the difference before and after is which set of box characters get mapped to?

Is there any JDK or Java SE specification that needs to be updated?

Joe Darcy added a comment - 2020-02-14 09:01

Marking the request as pended until the questions above are answered.

Ichiroh Takiguchi added a comment - 2020-02-16 05:54

Sorry, I'm late.

the difference before and after is which set of box characters get mapped to?

Unicode side codes are not changed. It means font glyphs are not changed.
For example, MS950 - Uncode mapping are not changed
\xA2\xA4 -> \u2550
\xF9\xF9 -> \u2550
Before change:
\u2550 -> \xA2\xA4
After change:
\u2550 -> \xF9\xF9

Is there any JDK or Java SE specification that needs to be updated?

No. This CSR does not affect Java SE specification. It just follows the latest Microsoft's CP950 specification.

Joe Darcy added a comment - 2020-02-18 10:48

Ichiroh Takiguchi, please explain what exactly this CSR proposes to alter, the value with of which methods would differ, etc.

Joe Darcy added a comment - 2020-02-19 12:51

This request will stay pended until the requested information is provided.

Ichiroh Takiguchi added a comment - 2020-02-20 02:40

Expected result:

Java's working behavior should be the same as Windows' one.

Exact change:

I'd like to change one-way trip conversion definitions by changing make/data/charsetmapping/MS950.nr. No logic change is included.

Working behavior:

Customer's case is as follows:

He uses Traditional Chinese Windows with Version Control System (VCS).

Windows implementation:

He opens the file which has \xF9\xF9 via Windows application, like Notepad.
He saved the file without any change.
\xF9\xF9 is stored as \xF9\xF9

==> VCS does not detect the change.

He opens the file which has \xA2\xA4 via Windows application
He saved the file without any change.
\xA2\xA4 is stored as \xF9\xF9

==> VCS can detect the change.

Current implementation:

He opened the file which has \xF9\xF9 via Java application without any change
He saved the file without any change
\xF9\xF9 is stored as \xA2\xA4, then VCS can detect the changes

==> VCS can detect the change.

He opens the file which has \xA2\xA4 via Windows application
He saved the file without any change.
\xA2\xA4 is stored as \xA2\xA4

==> VCS does not detect the change.

New implementation:

He opened the file which has \xF9\xF9 via Java application without any change
He saved the file without any change
\xF9\xF9 is stored as \xF9\xF9, then VCS can detect the changes.

==> VCS does not detect the change.

He opened the file which has \xA2\xA4 via Java application without any change
He saved the file without any change
\xA2\xA4 is stored as \xF9\xF9, then VCS can detect the changes.

==> VCS can detect the change.

If the change is applied, Java's working behavior is the same as Windows' one.

Joe Darcy added a comment - 2020-02-25 10:51

After some additional information from Naoto Sato, moving to Approved.

Please consider a release not for this change.

csr of

JDK-8245689 Align some one-way conversion in MS950 charset with Windows

Resolved

JDK-8259790 Align some one-way conversion in MS950 charset with Windows

Resolved

relates to

JDK-8259791 Align some one-way conversion in MS950 charset with Windows

Closed

Assignee:: Ichiroh Takiguchi

Reporter:: Ichiroh Takiguchi

Reviewed By:: Joe Darcy

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2020-06-25 02:08

Updated:: 2021-01-14 16:30

Resolved:: 2020-07-01 16:42

Details

Description

Summary

Problem

Solution

Specification

Q&A Comments

Attachments

Issue Links

Activity

People

Dates