A major Sun customer,
cannot move to Tiger because of a change in behavior in Character.isLetter()
Using the code snippet provided by the customer the behavior of
isLetter() changes post Mantis as follows:
tmarble@fred 38% pwd
/home/tmarble/javaperf/2004/TLR/Mantis
tmarble@fred 39% /usr/java/j2sdk1.4.2_07/bin/javac UnicodeTest.java
tmarble@fred 40% /usr/java/j2sdk1.4.2_07/bin/java UnicodeTest
is letter: false
is digit: false
tmarble@fred 41% cd ../Tiger
/home/tmarble/javaperf/2004/TLR/Tiger
tmarble@fred 42% /usr/java/jdk1.5.0_02/bin/javac UnicodeTest.java
tmarble@fred 43% /usr/java/jdk1.5.0_02/bin/java UnicodeTest
is letter: true
is digit: false
tmarble@fred 44% cd ../Mustang
/home/tmarble/javaperf/2004/TLR/Mustang
tmarble@fred 45% /usr/java/jdk1.6.0/bin/javac UnicodeTest.java
tmarble@fred 46% /usr/java/jdk1.6.0/bin/java UnicodeTest
is letter: true
is digit: false
tmarble@fred 47%
This is a bug because the behavior changed.
As the character in question is a modifier character I'm not
sure what the "right" behavior is, but I suspect that this may
relate to correct interpretation of unicode. FFI see:
I found the relavent Unicode map here:
http://www.unicode.org/charts/U02B0.pdf
There is further discussion of that specific character here:
http://www.tachyonsoft.com/uc0002.htm#U02C6
http://www.fileformat.info/info/unicode/char/02c6/index.htm
There is also a discussion of Unicode version 4:
http://www.unicode.org/versions/Unicode4.0.1/
Please note that according to bug 5034599 Unicode 4.0.1
will be delayed until Mustang. HOWEVER it is not clear that
this is a Unicode 4 issue.
And correct behavior for this one unicode character may not
indicate correctness of the universe of possibile letters
(correctness of isLetter() must be reviewed in the general case).
--Tom
###@###.### 10/27/04 20:07 GMT
###@###.### 10/27/04 22:30 GMT
The test source code as provided by ###@###.###:
public class UnicodeTest {
public static void main (String argv[]) {
char myUnicodeCharacter = (char) Integer.parseInt("2C6", 16);
System.out.println("is letter: " +
Character.isLetter(myUnicodeCharacter));
System.out.println("is digit: " +
Character.isDigit(myUnicodeCharacter));
}
}
###@###.### 10/28/04 17:40 GMT
cannot move to Tiger because of a change in behavior in Character.isLetter()
Using the code snippet provided by the customer the behavior of
isLetter() changes post Mantis as follows:
tmarble@fred 38% pwd
/home/tmarble/javaperf/2004/TLR/Mantis
tmarble@fred 39% /usr/java/j2sdk1.4.2_07/bin/javac UnicodeTest.java
tmarble@fred 40% /usr/java/j2sdk1.4.2_07/bin/java UnicodeTest
is letter: false
is digit: false
tmarble@fred 41% cd ../Tiger
/home/tmarble/javaperf/2004/TLR/Tiger
tmarble@fred 42% /usr/java/jdk1.5.0_02/bin/javac UnicodeTest.java
tmarble@fred 43% /usr/java/jdk1.5.0_02/bin/java UnicodeTest
is letter: true
is digit: false
tmarble@fred 44% cd ../Mustang
/home/tmarble/javaperf/2004/TLR/Mustang
tmarble@fred 45% /usr/java/jdk1.6.0/bin/javac UnicodeTest.java
tmarble@fred 46% /usr/java/jdk1.6.0/bin/java UnicodeTest
is letter: true
is digit: false
tmarble@fred 47%
This is a bug because the behavior changed.
As the character in question is a modifier character I'm not
sure what the "right" behavior is, but I suspect that this may
relate to correct interpretation of unicode. FFI see:
I found the relavent Unicode map here:
http://www.unicode.org/charts/U02B0.pdf
There is further discussion of that specific character here:
http://www.tachyonsoft.com/uc0002.htm#U02C6
http://www.fileformat.info/info/unicode/char/02c6/index.htm
There is also a discussion of Unicode version 4:
http://www.unicode.org/versions/Unicode4.0.1/
Please note that according to bug 5034599 Unicode 4.0.1
will be delayed until Mustang. HOWEVER it is not clear that
this is a Unicode 4 issue.
And correct behavior for this one unicode character may not
indicate correctness of the universe of possibile letters
(correctness of isLetter() must be reviewed in the general case).
--Tom
###@###.### 10/27/04 20:07 GMT
###@###.### 10/27/04 22:30 GMT
The test source code as provided by ###@###.###:
public class UnicodeTest {
public static void main (String argv[]) {
char myUnicodeCharacter = (char) Integer.parseInt("2C6", 16);
System.out.println("is letter: " +
Character.isLetter(myUnicodeCharacter));
System.out.println("is digit: " +
Character.isDigit(myUnicodeCharacter));
}
}
###@###.### 10/28/04 17:40 GMT
- duplicates
-
JDK-6212048 REGRESSION: Difference between java 1.4/5.0 unicode letter & digit recognition
- Closed
- relates to
-
JDK-5034599 RFE: Upgrade to Unicode 4.1
- Closed