Name: clC74495 Date: 02/13/99
I have sent this problem and solution to I18N team
via OEM engineer of Japan.
But the solution is not implemented now.
And there is no bugreport which described about it.
So I am posting the solution as bugreport now.
(See also bugid: 4100019.)
In JDK 1.2FCS and JDK 1.1.8,
MS932 encoding is added and used for default encoding of
Japanese Windows.
And now Shift_JIS alias name of sun.io.CharacterEncoding
is mapping to MS932.
MS932 is almost a superset of SJIS,
but some characters in JIS0208 are mapped
to different unicode code points.
In Japan, almost users of Windows and MacOS use SJIS text,
and almost users of UNIX(Solaris, Linux,...) use EUCJIS text.
So the same Java program which contains above the characters
is compiled to different class files.
For example,
0x2141(in JIS) character is mapped to
(a) \u301c (Unix users : EUCJIS) and
ex) System.out.println("\u301c");
(b) \uff5e (Windows users : MS932).
ex) System.out.println("\uff5e");
There is no problems to execution class files
on the same platform as platform to use at compile time.
But Java is platform-independ.
In above programs,
`System.out.println("\u301c");' outputs on Japanese Windows,
`?'.
`System.out.println("\uff5e");' outputs on Japanese Solaris,
`?'.
I proposed following solutions to I18N team via OEM engineer.
I hope the solutions will be implemented as soon as possible
in both of JDK 1.2.x and JDK 1.1.x.
1. Mapping Rules
(a) following mapping rules should be added to
sun.io.CharToByteJIS0208
(which is used in sun.io.CharToByteEUC_JP,
sun.io.CharToByteSJIS and sun.io.CharToByteISO2022JP)
\u203E (OVERLINE) -> 0x2131 (OVERLINE)
\u2014 (EM DASH) -> 0x213D (EM DASH)
\u22EF (MIDLINE HORIZONTAL ELLIPSIS) -> 0x2144 (HORIZONTAL ELLIPSIS)
\uFF5E (FULLWIDTH TILDE) -> 0x2141 (WAVE DASH)
\u2225 (PARALLEL TO) -> 0x2142 (DOUBLE VERTICAL LINE)
\uFF0D (FULLWIDTH HYPHEN-MINUS) -> 0x215D (MINUS SIGN)
\uFFE0 (FULLWIDTH CENT SIGN) -> 0x2171 (CENT SIGN)
\uFFE1 (FULLWIDTH POUND SIGN) -> 0x2172 (POUND SIGN)
\uFFE2 (FULLWIDTH NOT SIGN) -> 0x224C (NOT SIGN)
(b) following mapping rules should be added to
sun.io.CharToByteMS932.
\u203E (OVERLINE) -> 0x8150 (OVERLINE)
\u2014 (EM DASH) -> 0x815C (EM DASH)
\u22EF (MIDLINE HORIZONTAL ELLIPSIS) -> 0x8163 (HORIZONTAL ELLIPSIS)
\u301C (WAVE DASH) -> 0x8160 (WAVE DASH)
\u2016 (DOUBLE VERTICAL LINE) -> 0x8161 (DOUBLE VERTICAL LINE)
\u2212 (MINUS SIGN) -> 0x817C (MINUS SIGN)
\u00A2 (CENT SIGN) -> 0x8191 (CENT SIGN)
\u00A3 (POUND SIGN) -> 0x8192 (POUND SIGN)
\u00AC (NOT SIGN) -> 0x81CA (NOT SIGN)
2. Collation rules
above Unicode characters should be treat
as equal character in Collation (all levels ---Collator.PRIMARY,
Collator.SECONDARY, Collator.TERTIARY, Collator.IDENTICAL)
So following rules should be added to the last of DEFAULTRULES
in src/share/classes/java/text/CollationRules.java.
// SJIS and MS932 and MacJapanese
+ "&\u203E=\uFFE3" // OVERLINE(JIS X 0221,MacJapanese)
// = FULLWIDTH MACRON
+ "&\u2014=\u2015" // EM DASH(JIS X 0221,MacJapanese)
// = HORIZONTAL BAR
+ "&\u22EF=\u2026" // MIDLINE HORIZONTAL ELLIPSIS(MacJapanese)
// = HORIZONTAL ELLIPSIS
+ "&\uFF5E=\u301C" // FULLWIDTH TILDE(MS932) = WAVE DASH
+ "&\u2225=\u2016" // PARALLEL TO(MS932) = DOUBLE VERTICAL LINE
+ "&\uFF0D=\u2212" // FULLWIDTH HYPHEN-MINUS(MS932) = MINUS SIGN
+ "&\uFFE0=\u00A2" // FULLWIDTH CENT SIGN(MS932) = CENT SIGN
+ "&\uFFE1=\u00A3" // FULLWIDTH POUND SIGN(MS932) = POUND SIGN
+ "&\uFFE2=\u00AC" // FULLWIDTH NOT SIGN(MS932) = NOT SIGN
ie) MacJapanese is the encoding in Japanese MacOS.
There is same problem as MS932.
----------------
If this bug is fixed, the MS932-SJIS bugs will be fixed fully.
(Review ID: 54107)
======================================================================
I have sent this problem and solution to I18N team
via OEM engineer of Japan.
But the solution is not implemented now.
And there is no bugreport which described about it.
So I am posting the solution as bugreport now.
(See also bugid: 4100019.)
In JDK 1.2FCS and JDK 1.1.8,
MS932 encoding is added and used for default encoding of
Japanese Windows.
And now Shift_JIS alias name of sun.io.CharacterEncoding
is mapping to MS932.
MS932 is almost a superset of SJIS,
but some characters in JIS0208 are mapped
to different unicode code points.
In Japan, almost users of Windows and MacOS use SJIS text,
and almost users of UNIX(Solaris, Linux,...) use EUCJIS text.
So the same Java program which contains above the characters
is compiled to different class files.
For example,
0x2141(in JIS) character is mapped to
(a) \u301c (Unix users : EUCJIS) and
ex) System.out.println("\u301c");
(b) \uff5e (Windows users : MS932).
ex) System.out.println("\uff5e");
There is no problems to execution class files
on the same platform as platform to use at compile time.
But Java is platform-independ.
In above programs,
`System.out.println("\u301c");' outputs on Japanese Windows,
`?'.
`System.out.println("\uff5e");' outputs on Japanese Solaris,
`?'.
I proposed following solutions to I18N team via OEM engineer.
I hope the solutions will be implemented as soon as possible
in both of JDK 1.2.x and JDK 1.1.x.
1. Mapping Rules
(a) following mapping rules should be added to
sun.io.CharToByteJIS0208
(which is used in sun.io.CharToByteEUC_JP,
sun.io.CharToByteSJIS and sun.io.CharToByteISO2022JP)
\u203E (OVERLINE) -> 0x2131 (OVERLINE)
\u2014 (EM DASH) -> 0x213D (EM DASH)
\u22EF (MIDLINE HORIZONTAL ELLIPSIS) -> 0x2144 (HORIZONTAL ELLIPSIS)
\uFF5E (FULLWIDTH TILDE) -> 0x2141 (WAVE DASH)
\u2225 (PARALLEL TO) -> 0x2142 (DOUBLE VERTICAL LINE)
\uFF0D (FULLWIDTH HYPHEN-MINUS) -> 0x215D (MINUS SIGN)
\uFFE0 (FULLWIDTH CENT SIGN) -> 0x2171 (CENT SIGN)
\uFFE1 (FULLWIDTH POUND SIGN) -> 0x2172 (POUND SIGN)
\uFFE2 (FULLWIDTH NOT SIGN) -> 0x224C (NOT SIGN)
(b) following mapping rules should be added to
sun.io.CharToByteMS932.
\u203E (OVERLINE) -> 0x8150 (OVERLINE)
\u2014 (EM DASH) -> 0x815C (EM DASH)
\u22EF (MIDLINE HORIZONTAL ELLIPSIS) -> 0x8163 (HORIZONTAL ELLIPSIS)
\u301C (WAVE DASH) -> 0x8160 (WAVE DASH)
\u2016 (DOUBLE VERTICAL LINE) -> 0x8161 (DOUBLE VERTICAL LINE)
\u2212 (MINUS SIGN) -> 0x817C (MINUS SIGN)
\u00A2 (CENT SIGN) -> 0x8191 (CENT SIGN)
\u00A3 (POUND SIGN) -> 0x8192 (POUND SIGN)
\u00AC (NOT SIGN) -> 0x81CA (NOT SIGN)
2. Collation rules
above Unicode characters should be treat
as equal character in Collation (all levels ---Collator.PRIMARY,
Collator.SECONDARY, Collator.TERTIARY, Collator.IDENTICAL)
So following rules should be added to the last of DEFAULTRULES
in src/share/classes/java/text/CollationRules.java.
// SJIS and MS932 and MacJapanese
+ "&\u203E=\uFFE3" // OVERLINE(JIS X 0221,MacJapanese)
// = FULLWIDTH MACRON
+ "&\u2014=\u2015" // EM DASH(JIS X 0221,MacJapanese)
// = HORIZONTAL BAR
+ "&\u22EF=\u2026" // MIDLINE HORIZONTAL ELLIPSIS(MacJapanese)
// = HORIZONTAL ELLIPSIS
+ "&\uFF5E=\u301C" // FULLWIDTH TILDE(MS932) = WAVE DASH
+ "&\u2225=\u2016" // PARALLEL TO(MS932) = DOUBLE VERTICAL LINE
+ "&\uFF0D=\u2212" // FULLWIDTH HYPHEN-MINUS(MS932) = MINUS SIGN
+ "&\uFFE0=\u00A2" // FULLWIDTH CENT SIGN(MS932) = CENT SIGN
+ "&\uFFE1=\u00A3" // FULLWIDTH POUND SIGN(MS932) = POUND SIGN
+ "&\uFFE2=\u00AC" // FULLWIDTH NOT SIGN(MS932) = NOT SIGN
ie) MacJapanese is the encoding in Japanese MacOS.
There is same problem as MS932.
----------------
If this bug is fixed, the MS932-SJIS bugs will be fixed fully.
(Review ID: 54107)
======================================================================