-
Bug
-
Resolution: Fixed
-
P3
-
1.1.8
-
kestrel
-
generic
-
generic
Name: rlT66838 Date: 08/10/99
Problem : Four tone marks have been converted incorrectly from unicode to cp874 encoding.
Platform : Windows 98 Thai, Windows NT 4.0 Thai
Java API : OutputStreamWriter with cp874 encoding
See attached file: ThaiCodeConversion.java)(See attached file:
thai874.inp)(See attached file: thai838.inp)(See attached file:
ThaiCodeConversion.class)
Please detach all of them to the same directory and run java
ThaiCodeConversion.
The ThaiCodeConversion application read thai874.inp with cp874 encoding and
then write the content, it read from thai874.inp, to thai874.out with the
same encoding (cp874).
I found that the InputStreamReader with cp874 encoding works correctly but
the conversion problem is in the OutputStreamWriter. We can prove that it
has an error by comparing thai874.inp to thai874.out. If they are
identical, it means both methods works fine with cp874 encoding.
Otherwise, the InputStreamReader or the OutputStreamWriter or both have
problem in cp874 character conversion.
After running the ThaiCodeConversion, I compared thai874.inp to thai874.out
and found that they are not identical.
Please see the output below:-
C:\>java ThaiCodeConversion
Reading ... thai874.inp
Creating ... thai874.out
Reading ... thai838.inp
Creating ... thai838.out
C:\>fc /b thai874.inp thai874.out
Comparing files thai874.inp and thai874.out
00000071: EC DE
00000085: E8 A0
00000089: E8 A0
00000097: EC DE
000000A5: E8 A0
000000B8: E8 A0
000000BB: E9 DB
000000C6: E8 A0
000000C9: E8 A0
000000D6: E8 A0
000000E3: E8 A0
000000EF: E8 A0
0000012A: E9 DB
0000012C: EA DC
00000133: E8 A0
>From the output above, Four Thai characters have been converted incorrectly
by the OutputStreamWriter with cp874 encoding. I beleive the conversion
table from Unicode to cp874 encoding must be corrected.
Unicode The output from the OutputStreamWriter
======= ===============================
0x0e48 0xa0 (the correct value should be 0xe8)
0x0e49 0xcb (the correct value should be 0xe9)
0x0e4a 0xcc (the correct value should be 0xea)
0x0e4c 0xce (the correct value should be 0xec)
Notes :-
- The ThaiCodeConversion also test with cp838 encoding (Thai EBCDIC
encoding) and it works without any problem.
- thai874.inp we used in this testcase contains all Thai and English
characters.
(Review ID: 93671)
======================================================================