Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4261306

Incorrect code conversion with Thai tone marks from unicode to codepage 874.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P3 P3
    • 1.3.0
    • 1.1.8
    • core-libs
    • kestrel
    • generic
    • generic



      Name: rlT66838 Date: 08/10/99


      Problem : Four tone marks have been converted incorrectly from unicode to cp874 encoding.
      Platform : Windows 98 Thai, Windows NT 4.0 Thai
      Java API : OutputStreamWriter with cp874 encoding

      See attached file: ThaiCodeConversion.java)(See attached file:
      thai874.inp)(See attached file: thai838.inp)(See attached file:
      ThaiCodeConversion.class)

      Please detach all of them to the same directory and run java
      ThaiCodeConversion.

      The ThaiCodeConversion application read thai874.inp with cp874 encoding and
      then write the content, it read from thai874.inp, to thai874.out with the
      same encoding (cp874).
      I found that the InputStreamReader with cp874 encoding works correctly but
      the conversion problem is in the OutputStreamWriter. We can prove that it
      has an error by comparing thai874.inp to thai874.out. If they are
      identical, it means both methods works fine with cp874 encoding.
      Otherwise, the InputStreamReader or the OutputStreamWriter or both have
      problem in cp874 character conversion.

      After running the ThaiCodeConversion, I compared thai874.inp to thai874.out
      and found that they are not identical.
      Please see the output below:-

      C:\>java ThaiCodeConversion
      Reading ... thai874.inp
      Creating ... thai874.out
      Reading ... thai838.inp
      Creating ... thai838.out

      C:\>fc /b thai874.inp thai874.out
      Comparing files thai874.inp and thai874.out
      00000071: EC DE
      00000085: E8 A0
      00000089: E8 A0
      00000097: EC DE
      000000A5: E8 A0
      000000B8: E8 A0
      000000BB: E9 DB
      000000C6: E8 A0
      000000C9: E8 A0
      000000D6: E8 A0
      000000E3: E8 A0
      000000EF: E8 A0
      0000012A: E9 DB
      0000012C: EA DC
      00000133: E8 A0

      >From the output above, Four Thai characters have been converted incorrectly
      by the OutputStreamWriter with cp874 encoding. I beleive the conversion
      table from Unicode to cp874 encoding must be corrected.

      Unicode The output from the OutputStreamWriter
      ======= ===============================
      0x0e48 0xa0 (the correct value should be 0xe8)
      0x0e49 0xcb (the correct value should be 0xe9)
      0x0e4a 0xcc (the correct value should be 0xea)
      0x0e4c 0xce (the correct value should be 0xec)


      Notes :-
      - The ThaiCodeConversion also test with cp838 encoding (Thai EBCDIC
      encoding) and it works without any problem.
      - thai874.inp we used in this testcase contains all Thai and English
      characters.

      (Review ID: 93671)
      ======================================================================

            sherman Xueming Shen
            rlewis Roger Lewis (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: