Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4891024

EUC-KR and JOHAB converters need to be updated to include two new characters

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P4
    • 7
    • 1.4.0
    • core-libs

    Description

      Name: gm110360 Date: 07/15/2003


      FULL PRODUCT VERSION :
      java version "1.4.0_01"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0_01-b03)
      Java HotSpot(TM) Client VM (build 1.4.0_01-b03, mixed mode)


      FULL OPERATING SYSTEM VERSION : Red Hat Linux 8.0


      ADDITIONAL OPERATING SYSTEMS :
      This bug is INDEPENDENT of OS and happens under
      all OS', but I'm just specifying
      Linux because there's no choice for "ALL OS'".
      As such, kernel version, glibc veersion and other
      details are not relevant to this bug.



      A DESCRIPTION OF THE PROBLEM :
      EUC_KR and JOHAB converters in JDK have not
      been updated to include two new characters
      added to KS X 1001:1998 in 1998 (the previous versions
      of this character set standard were issued
      under the designation of KS C 5601-1987,
      KS C 5601-1992 and KS X 1001:1997).
      Two new characters added were
                                   GL GR
        U+20AC EURO Sign (0x22,0x66) (0xa2,0xe6)
        U+00AE Registered Sign (0x22, 0x67) (0xa2,0xe7)

      For EUC-KR converters, they have to be in GR positions.
      For JOHAB converters, their code points have to be
      translated following the same was as other code points
      for symbol characters are translated from GL or GR
      position. Their positions in JOHAB are 0xD9E6 and
      0xD9E7.

      Last March,
      I also contacted Solaris I18N team and was told that
      next release of Solaris would add these two characters
      to EUC-KR codeset of Solaris. Mozilla/Netscape was updated
      (http://bugzilla.mozilla.org/show_bug.cgi?id=134749)
      and Sybase would do the same in their products.
      Linux Glibc fixed this problem a long time ago (in
      late 1999 or early 2000). Microsoft was probably
      the first to add these two characters to their
      Windows-949. MS949 converter for JDK 1.4 correctly
      handles these two characters, too.

      IBM and Oracle were also notified.

      It would be nice if Java take care of this
      problem soon.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1.Make a simple UTF-8 file with only two characters
        U+20AC and U+00AE with your favorite text editor
      (capable of UTF-8 handling) and save it to a file
      'test'.
      2. run the following three commands

         $ native2ascii -encoding UTF-8 text | native2ascii -reverse -encoding EUC_KR
         $ native2ascii -encoding UTF-8 text | native2ascii -reverse -encoding Johab
         $ native2ascii -encoding UTF-8 text | native2ascii -reverse -encoding MS949



      EXPECTED VERSUS ACTUAL BEHAVIOR :
      Expected results:
      The first and the last commands emit out octet streams
      made of '0xA2 0xE6 0xA2 0xE7' and the second
      command outputs the octet seq. of
      '0xD9 0xE6 0xD9 0xE7'. ('use hexdump' in Solaris/Linux
      and other binary-viewing tools of your choice
      to examine the output)

      Actual results:

        Instead, The first two
      command output
      \u20ac\u00ae which means they're not representable
      in EUC_KR and Johab as far as JDK is concerned.
        On the other hand, the last command (before
      piping through 'hexdump') emits
      octet seqeunce of '0xA2 0xE6 0xA2 0xE7' as expected.

      This clearly show that MS949 converter was updated
      to include two new characters in KS X 1001:1998,
      but EUC_KR and Johab converters haven't been.




      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      N/A.

      I believe testing this problem with native2ascii is sufficient to
      demonstrate the issue at hand.

      ---------- END SOURCE ----------
      (Incident Review ID: 166984)
      ======================================================================

      Attachments

        Activity

          People

            sherman Xueming Shen
            gmanwanisunw Girish Manwani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: