Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8248305

Align some one-way conversion in MS950 charset with Windows

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P3 P3
    • 11-pool
    • core-libs
    • None
    • behavioral
    • low
    • Hide
      Applications that expect the existing mapping for those one-way conversion code points will not work. Since these are uncommon box character code points, the risk is low. It should be best avoided to introduce a property to switch back to the old behavior.
      Show
      Applications that expect the existing mapping for those one-way conversion code points will not work. Since these are uncommon box character code points, the risk is low. It should be best avoided to introduce a property to switch back to the old behavior.
    • Java API
    • Implementation

      This one is same as JDK-8233385.
      https://bugs.openjdk.java.net/browse/JDK-8233385

      Summary

      MS950 charset encoder behaves differently as defined in the Traditional Chinese Windows specification

      Problem

      Windows code page 950 has some n:1 byte-to-char mappings for certain code points. In JDK's MS950 charset, there are 4 char-to-byte mappings differ from Traditional Chinese Windows.
      (Actual issue was in https://bugs.openjdk.java.net/browse/JDK-8232161)

      Solution

      I recommend that following 4 char-to-byte mappings need to change.

      Before:

      \u2550 -> \xA2\xA4
      \u255E -> \xA2\xA5
      \u2561 -> \xA2\xA7
      \u256A -> \xA2\xA7
      

      After:

      \u2550 -> \xF9\xF9
      \u255E -> \xF9\xE9
      \u2561 -> \xF9\xEB
      \u256A -> \xF9\xEA
      


      Definition:
      Traditional Chinese Windows conversion table is:
      https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
      Newer MS950 definition is:
      https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt

      \u2550, \u255E, \u2561 and \u256A are in BOX DRAWINGS Unicode range.
      (See attached 4Chras.png for font glyphs)

      Specification

      N/A

      Q&A Comments

      Joe Darcy added a comment - 2020-02-13 18:07

      Ichiroh Takiguchi, so the difference before and after is which set of box characters get mapped to?

      Is there any JDK or Java SE specification that needs to be updated?

      Joe Darcy added a comment - 2020-02-14 09:01

      Marking the request as pended until the questions above are answered.

      Ichiroh Takiguchi added a comment - 2020-02-16 05:54

      Sorry, I'm late.

      the difference before and after is which set of box characters get mapped to?

      Unicode side codes are not changed. It means font glyphs are not changed.
      For example, MS950 - Uncode mapping are not changed
      \xA2\xA4 -> \u2550
      \xF9\xF9 -> \u2550
      Before change:
      \u2550 -> \xA2\xA4
      After change:
      \u2550 -> \xF9\xF9

      Is there any JDK or Java SE specification that needs to be updated?

      No. This CSR does not affect Java SE specification. It just follows the latest Microsoft's CP950 specification.

      Joe Darcy added a comment - 2020-02-18 10:48

      Ichiroh Takiguchi, please explain what exactly this CSR proposes to alter, the value with of which methods would differ, etc.

      Joe Darcy added a comment - 2020-02-19 12:51

      This request will stay pended until the requested information is provided.

      Ichiroh Takiguchi added a comment - 2020-02-20 02:40

      Expected result:

      Java's working behavior should be the same as Windows' one.

      Exact change:

      I'd like to change one-way trip conversion definitions by changing make/data/charsetmapping/MS950.nr. No logic change is included.

      Working behavior:

      Customer's case is as follows:

      • He uses Traditional Chinese Windows with Version Control System (VCS).

      Windows implementation:

      • He opens the file which has \xF9\xF9 via Windows application, like Notepad.
      • He saved the file without any change.
      • \xF9\xF9 is stored as \xF9\xF9

      ==> VCS does not detect the change.

      • He opens the file which has \xA2\xA4 via Windows application
      • He saved the file without any change.
      • \xA2\xA4 is stored as \xF9\xF9

      ==> VCS can detect the change.

      Current implementation:

      • He opened the file which has \xF9\xF9 via Java application without any change
      • He saved the file without any change
      • \xF9\xF9 is stored as \xA2\xA4, then VCS can detect the changes

      ==> VCS can detect the change.

      • He opens the file which has \xA2\xA4 via Windows application
      • He saved the file without any change.
      • \xA2\xA4 is stored as \xA2\xA4

      ==> VCS does not detect the change.

      New implementation:

      • He opened the file which has \xF9\xF9 via Java application without any change
      • He saved the file without any change
      • \xF9\xF9 is stored as \xF9\xF9, then VCS can detect the changes.

      ==> VCS does not detect the change.

      • He opened the file which has \xA2\xA4 via Java application without any change
      • He saved the file without any change
      • \xA2\xA4 is stored as \xF9\xF9, then VCS can detect the changes.

      ==> VCS can detect the change.

      If the change is applied, Java's working behavior is the same as Windows' one.

      Joe Darcy added a comment - 2020-02-25 10:51

      After some additional information from Naoto Sato, moving to Approved.

      Please consider a release not for this change.

            itakiguchi Ichiroh Takiguchi (Inactive)
            itakiguchi Ichiroh Takiguchi (Inactive)
            Joe Darcy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: