-
Enhancement
-
Resolution: Duplicate
-
P5
-
None
-
1.2.0
-
generic
-
generic
Name: dbT83986 Date: 03/02/99
Is there a way to identify which character encodings are reversible? The documentation on them at least should mention which ones are.
==================================
REVIEW NOTE 3/10/99 - User responded with additional information
Hi, David. Thanks for the reply.
If you refer to the "DATA TRANSFER PROBLEMS ON WINDOWS" section in the
'README' file in jdk1.1.7B, you'll see a paragraph discussing the
occasional need to store byte data in a string. I use this sometimes to
avoid large static initializers.
[I pasted the relevent snippet below.]
An easy way to convert bytes to a string is to use one of the
encodings. (Please remember that this function is meant for converting
bytes that have already been written out via an encoding).
Note that I want to use this in kind of an unconventional way in that I
want to start with bytes and convert to a string, then convert back.
The bytes are the first step in this case.
The notes below outline that only encodings that are "reversible" can
convert in two directions. One example is ISO8859_1. This encoder,
however, I think wastes space in the string by only using the lower
half. Perhaps other encodings would give slightly better results, but
I'm not sure which can go both ways.
Of course, I realize this isn't the purpose, but it can reduce code size
by not having to write byte->char packing/unpacking code. My solution
up until now has been to gzip the byte array, then pack two bytes per
char in a string.
I guess the whole point to my ramblings is twofold:
1) Have a method or something, such as isReversible(String) for the
various encodings,
and
2) Maybe create a new encoding that packs bytes into chars, two bytes
per char. We can call it "DAVID-SHAWN".
Thanks for your time,
-Shawn
P.S. Here is the README snippet:
=======================================================================
DATA TRANSFER PROBLEMS ON WINDOWS
=======================================================================
A bug in the data transfer API (4032895) prevents most objects from
being copied to the Win32 clipboard. A common workaround is to convert
objects to a String representation, since String objects are not
affected by this bug.
One popular technique for converting an object to a string is to write
the object into a ByteArrayOutputStream and convert the stream to a
String with toString(). String.getBytes() reverses the process.
There is a potential problem with this kind of byte/character
conversion. Both toString() and getBytes() rely on a locale-specific
character encoder to translate byte values to and from Unicode
character values. Not all encoders assume a one-to-one relationship
between byte values and character values. To ensure a reliable
translation, do not rely on the default locale encoder. Explicitly
specify an encoder that uses a reversible translation, such as
ISO8859_1. Do this by passing the encoder name to toString() and
getBytes():
aString = aStream.toString("ISO8859_1");
aByteArray = aString.getBytes("ISO8859_1");
In previous releases, the need to use a reversible encoding was not
apparent to most programmers. ISO8859_1 was the default encoder for
western locales on both Solaris and Win32. A program's dependence on
ISO8859_1 might not be apparent if the program was not tested under a
non-western locale.
JDK software running on Win32 machines uses Cp1252 (Windows Latin-1) as
the default encoding for western locales. Cp1252 does not implement a
reversible byte/character translation. It may appear to some
programmers that 1.1.7 introduces an incompatibiity. The real problem
is a programming technique that unintentionally relies on the features
of specific locales.
(Review ID: 52380)
======================================================================
- duplicates
-
JDK-4066902 Clipboard.setContents not working for custom flavor
-
- Closed
-