Name: nl37777 Date: 08/29/2003
The Java VM and the various interfaces attached to it
(such as the Java Native Interface) have always used a modified form of
the standard UTF-8 encoding. The same encoding has been used in the
java.io.DataInput and DataOutput classes, but there has been documented
for a long time as "Java modified UTF-8". Since Java modified UTF-8 and
standard UTF-8 are incompatible, it is necessary to clarify throughout
the Java platform specifications which interfaces use which encoding.
Also, the description in the Java Virtual Machine Specification and
some other documentation make it sound as if Java modified UTF-8 could
not encode supplementary characters. In fact, it appears that all parts
of the J2SDK that deal with Java modified UTF-8 handle supplementary
characters just fine - they simply represent the surrogate pair of the
character's UTF-16 representation as two three-byte sequences.
This needs to be better documented at least in the following
specifications:
- Java Virtual Machine Specification
- Java Native Interface Specification
- Object Serialization Specification
- Java Platform Debugger Architecture
- Java Virtual Machine Profiler Interface
- Java Virtual Machine Tool Interface
This is part of Tiger release driver 4533872.
======================================================================
- duplicates
-
JDK-4873956 RandomAccessFile.writeUTF(...) doesn't say "modified"
- Closed
- relates to
-
JDK-4533872 Unicode supplementary character support (JSR-204)
- Resolved
-
JDK-5044673 JVMTI Doc: Clarify supplementary character handling is modified UTF-8
- Resolved
-
JDK-5049313 Implement all JVMTI strings as modified UTF-8
- Closed