Name: sg39081 Date: 08/15/97
HP's Network Node Manager software UI is being developed with
JDK 1.1.1. In this environment, string data is being generated
and logged by agents running on many different platforms (HP-UX,
Solaris 2.4, Solaris 2.5.*, NT, etc.). (Some agents are existing
code; some are new. All are written in C/C++.) In Japan,
these platforms do not have a single common codeset in which
data can be logged. Therefore, the Java UI applet must be able
to read (and convert into Unicode) both Shift-JIS and Japanese
EUC.
Since the converter functions have been pulled into private, it
seems the only portable way to convert is to set the locale of
the InputStreamReader object and rely on the default converter.
Unfortunately, the "ja" locale only has one code set and must be
hardwired by the implementation of the VM. What is needed is
the ability to set a more fully qualified locale that controls
the converter. For example, when the Java UI must read a log
file generated by Solaris 2.4, the converter must be the
Japanese EUC converter. When the Java UI must read a log file
generated by NT, the converter must be the Shift-JIS converter.
A solution would be to have the following locales supported:
ja_JP.SJIS
ja_JP.eucJP
ja (could still be provided with a platform specific codeset)
If my Java applet was isolated, running only on a single platform
and reading data generated on the platform, it would not be an
issue. But in a distributed, heterogeneous environment, I must
be able to convert to Unicode from the different codesets
supported by those platforms.
One note: I know Solaris has the concept of a default codeset
for a locale. NT has the same concept. HP-UX does not have
this concept. All locales are fully qualified... there is no
such thing as a "ja" locale on HP-UX - only ja_JP.SJIS and
ja_JP.eucJP. The same is true for Western Europe (de_DE.iso88591
and de_DE.roman8).
company - Hewlett-Packard, Co. , email - ###@###.###
======================================================================
Additional information from user, sheri.good@Eng 1998-01-14:
The reviewer is correct in that the default converter is tied to the
locale. That is the problem. If there is going to be a default
converter, and that converter is tied to a locale, then some mechanism
must allow for the fact that there are muliple code sets for any given
Java locale. For the ja_JP locale, the code set might be Shift-JIS, it
might be eucJP, or it might be UTF8. There is no way to indicate to the
virtual machine which code set is the correct code set for the locale.
So... either the default converter concept should be deprecated, or the
current mechanism expanded to allow specification of the external
codeset associated with the locale specific data. The bug report I
submitted suggested a way to correct the problem without deprecating the
default converter concept.
I've spoken with both xxxx (JavaSoft I18N) and xxx
(SunSoft I18N) about this, and think they understand and agree. Let me
ask you a question... for JVM running on a Solaris 2.6 box, what
encoding is identified for the default converter associated with the
ja_JP locale? (Solaris 2.6 supports both the eucJP code set and the
Shift-JIS code set.) If the applet/application is running in the ja_JP
locale and an input stream is opened and read with InputStreamReader,
how do you know that the correct converter has been opened?
Please feel free to call if you want to talk about this in more
detail.
Tom
I18N Architect
OpenView Software Division
Hewlett-Packard, Co.