Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8368845

x-IBM930 uses incorrect character for Hex 42 60

XMLWordPrintable

      A DESCRIPTION OF THE PROBLEM :
      IBM930 uses the wrong character when decoding the hex sequence 0x4260.
      The correct character would be U+2212 (full-width hyphen).
      The character being used currently is U+FF0D (Minus sign)

      This is especially important when trying to convert from IBM930 to windows-31j, which has a full-width hyphen (0x817C), but no minus sign.
      In particular the equivalence between IBM930's 0x4260 and Windows-31j's 0x817C is established in this document from IBM: https://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP00300.pdf, page 411. I believe the corrections in 12.1.2 might not have been incorporated into this character set.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Decode the byte sequence 0x4260 into a string using the x-IBM930 charset, and then encode it to bytes using Windows-31j charset.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The expected output bytes would be 0x817C
      ACTUAL -
      The actual result is that the character is not in the windows-31j charset, and you will get a replacement character, error, or nothing according to the charset configuration.

            naoto Naoto Sato
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: