-
Bug
-
Resolution: Fixed
-
P4
-
8u66, 9
-
b120
-
x86_64
-
windows_7
-
Verified
FULL PRODUCT VERSION :
A DESCRIPTION OF THE PROBLEM :
According to Unicode Standard Annex 44: http://unicode.org/reports/tr44/#General_Category_Values, general category C should contain category Cc, Cf, Cs, Co, Cn.
However, Java's implementation excludes Cn from the list.
This bug has been there since the beginning, up to the latest commit on JDK9: http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/16fc042acee6/src/java.base/share/classes/java/util/regex/Pattern.java#l5671
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
System.out.println("\uFFFF".matches("\\p{Cn}"));
System.out.println("\uFFFF".matches("\\pC"));
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
true
true
ACTUAL -
true
false
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
Explicitly specify the character classes for general category Other. e.g. `\p{Cc}\p{Cf}\p{Cn}\p{Co}\p{Cs}`
A DESCRIPTION OF THE PROBLEM :
According to Unicode Standard Annex 44: http://unicode.org/reports/tr44/#General_Category_Values, general category C should contain category Cc, Cf, Cs, Co, Cn.
However, Java's implementation excludes Cn from the list.
This bug has been there since the beginning, up to the latest commit on JDK9: http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/16fc042acee6/src/java.base/share/classes/java/util/regex/Pattern.java#l5671
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
System.out.println("\uFFFF".matches("\\p{Cn}"));
System.out.println("\uFFFF".matches("\\pC"));
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
true
true
ACTUAL -
true
false
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
Explicitly specify the character classes for general category Other. e.g. `\p{Cc}\p{Cf}\p{Cn}\p{Co}\p{Cs}`