FULL PRODUCT VERSION :
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)
FULL OS VERSION :
Applies to all OSes
A DESCRIPTION OF THE PROBLEM :
The javadoc for the String.length() method says "Returns the length of this string. The length is equal to the number of 16-bit Unicode characters in the string."
The problem is that this is meaningless, because there is no such thing as a 16-bit Unicode character.
What this method *actually* does is to return the number of 16-bit UTF-16 code values in the string. This is *not* the same as the number of Unicode code points (characters) encoded by the string. If surrogates are used both surrogates are counted by this method, but the surrogate pair only encodes a single Unicode code point.
This should be cleared up, since very few people get this straight. The JDK documentation should not make the confusion worse.
REPRODUCIBILITY :
This bug can be reproduced always.
###@###.### 2005-1-29 01:46:41 GMT
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)
FULL OS VERSION :
Applies to all OSes
A DESCRIPTION OF THE PROBLEM :
The javadoc for the String.length() method says "Returns the length of this string. The length is equal to the number of 16-bit Unicode characters in the string."
The problem is that this is meaningless, because there is no such thing as a 16-bit Unicode character.
What this method *actually* does is to return the number of 16-bit UTF-16 code values in the string. This is *not* the same as the number of Unicode code points (characters) encoded by the string. If surrogates are used both surrogates are counted by this method, but the surrogate pair only encodes a single Unicode code point.
This should be cleared up, since very few people get this straight. The JDK documentation should not make the confusion worse.
REPRODUCIBILITY :
This bug can be reproduced always.
###@###.### 2005-1-29 01:46:41 GMT
- duplicates
-
JDK-5082220 String.length spec refers to "16-bit Unicode characters"
-
- Resolved
-