Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4915107

Clarify supplementary character handling in modified UTF-8

XMLWordPrintable

    • b54
    • generic, x86
    • generic, windows_xp



      Name: nl37777 Date: 08/29/2003

      The Java VM and the various interfaces attached to it
      (such as the Java Native Interface) have always used a modified form of
      the standard UTF-8 encoding. The same encoding has been used in the
      java.io.DataInput and DataOutput classes, but there has been documented
      for a long time as "Java modified UTF-8". Since Java modified UTF-8 and
      standard UTF-8 are incompatible, it is necessary to clarify throughout
      the Java platform specifications which interfaces use which encoding.
      Also, the description in the Java Virtual Machine Specification and
      some other documentation make it sound as if Java modified UTF-8 could
      not encode supplementary characters. In fact, it appears that all parts
      of the J2SDK that deal with Java modified UTF-8 handle supplementary
      characters just fine - they simply represent the surrogate pair of the
      character's UTF-16 representation as two three-byte sequences.

      This needs to be better documented at least in the following
      specifications:
      - Java Virtual Machine Specification
      - Java Native Interface Specification
      - Object Serialization Specification
      - Java Platform Debugger Architecture
      - Java Virtual Machine Profiler Interface
      - Java Virtual Machine Tool Interface
      This is part of Tiger release driver 4533872.
      ======================================================================

            nlindenbsunw Norbert Lindenberg (Inactive)
            nlindenbsunw Norbert Lindenberg (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: