Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Unresolved
Priority: P4
Fix Version/s: None
Affects Version/s: 6
Component/s: core-libs
Labels:

Subcomponent:
java.nio.charsets
Understanding:
Fix Understood
CPU:

x86
OS:

linux

A DESCRIPTION OF THE REQUEST :
Modified UTF-8 is used in many places, including JNI, Data*Stream, JAR. The Charset class in nio is a good way to encapsulate dealing with encodings, without reimplementing the encoding process over and over again.

It would be good to have access to this encoding as a Charset class as well. Preferrably it would be a required standard character set (accessible through a constant as requested in 4884238). A name could be "x-modified-UTF-8" or "x-CESU-8-nullfree" or something like this.

The current UTF-8 decoder implementation already handles all variations of this encoding. Simple modifications to the encoder would easily provide a class for these encodings as well.

  Bug 4641026 made clear to me that the encoding is more closely related to CESU-8, although the encoding of \u0000 is different. It might make sense to make the normal CESU-8 encoding available as well.

JUSTIFICATION :
I'm thinking about how to solve the much voted for bug 4244499.

My approach would be to move encoding from the native code into the java classes, and let a Charset object control what encoding to use. But to be backward compatible I would have to use modified UTF-8 by default.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Charset.forName("x-modified-UTF-8")
returns Charset encoding modified UTF-8 and decoding any UTF-8 variant.
ACTUAL -
UnsupportedCharsetException thrown.

---------- BEGIN SOURCE ----------
import java.nio.charset.Charset;
import java.util.Map;

public class CharTest {
    public static void main(String[] args) throws Exception {
for (Map.Entry<String, Charset> p :
Charset.availableCharsets().entrySet()) {
System.out.println(p.getKey());
for (String a : p.getValue().aliases()) {
System.out.println("\t" + a);
}
}
System.out.println(Charset.forName("x-modified-UTF-8").name());
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
It is of course possible to implement the modified UTF-8 encoding manually, as has been done with Data*Stream or Zip*Stream. It would even be possible to have a Charset field to choose the charset, and have the special value null denote modified UTF-8. Every encoding or decoding would then check if the Charset object is present, and otherwise fall back to the manual implementation.

relates to

JDK-6862139 (bf) Add put/getBoolean and put/getUTF methods

Closed

Assignee:: Xueming Shen

Reporter:: Nelson Dcosta (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2006-05-22 08:26

Updated:: 2011-02-16 11:15

Imported:: 15/Sep/12 1:24 PM

Indexed:: 17/Jul/12 10:55 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates