Name: gm110360 Date: 07/08/2002
FULL PRODUCT VERSION :
java version "1.4.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0_01-b03)
Java HotSpot(TM) Client VM (build 1.4.0_01-b03, mixed mode)
FULL OPERATING SYSTEM VERSION :
Windows 2000
ADDITIONAL OPERATING SYSTEMS :
Linux, Solaris
A DESCRIPTION OF THE PROBLEM :
The JDK lacks Unicode 3.1 support.
Unicode 3.1 needs to be supported, it adds various
code points outside of the Unicode BMP (Basic
Multilingual Plane).
Java, so far, has gotten away with assuming all
characters will have 16-bit representations and
there will be no codings assigned outside of Plane 0.
That has been theoretically false for years but
in practice, it's been true - up until now.
This will now simply not work with Unicode 3.1 and it
will be necessary to add methods to query for
surrogates and get appropriate values (maybe as int)
for higher planes. Java can still use UTF-16 of
course but that does mean that sometimes 2 java
characters will be needed to encode *one* unicode
code point.
The java.lang.Character class needs to be updated and
so do various stream and buffer classes (which currently
simply ignore and/or discard surrogate codings).
Unicode 3.1 is here *today*. Java has to add support
- and the sooner the better.
REPRODUCIBILITY :
This bug can be reproduced always.
(Review ID: 158701)
======================================================================
- duplicates
-
JDK-4533872 Unicode supplementary character support (JSR-204)
- Resolved