-
Bug
-
Resolution: Fixed
-
P3
-
1.4.1, 6
-
b14
-
generic, x86
-
generic, windows_2000
Name: rmT116609 Date: 03/10/2003
FULL PRODUCT VERSION :
java version "1.4.1_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_02-b06)
Java HotSpot(TM) Client VM (build 1.4.1_02-b06, mixed mode)
FULL OS VERSION :
Microsoft Windows 2000 [Version 5.00.2195] (sp2)
A DESCRIPTION OF THE PROBLEM :
JavaDoc for java.util.Pattern says
" The supported blocks and categories are those of The Unicode Standard, Version 3.0. ... The category names are those defined in table 4-5 of the Standard (p. 88), both normative and informative."
http://java.sun.com/j2se/1.4.1/docs/api/java/util/regex/Pattern.html#ubc
The categories listed in this table include
Pi = Punctuation, initial quote
Pf = Punctuation, final quote
http://www.unicode.org/book/ch04.pdf (page 88)
These are used to identify quotation marks as listed in
http://www.unicode.org/Public/UNIDATA/PropList.txt
However, categories Pi and Pf are not supported by java.util.Pattern.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Pattern.compile("\\p{Pi}[^\\p{Pf}]*\\p{Pf}").matcher("\u201Cquoted text\u201D").matches()
java.util.regex.PatternSyntaxException: Unknown character category {Pi} near index 5
\p{Pi}[^\p{Pf}]*\p{Pf}
^
ERROR MESSAGES/STACK TRACES THAT OCCUR :
java.util.regex.PatternSyntaxException: Unknown character category {Pi} near index 5
\p{Pi}[^\p{Pf}]*\p{Pf}
^
at java.util.regex.Pattern.error(Pattern.java:1489)
at java.util.regex.Pattern.familyError(Pattern.java:2160)
at java.util.regex.Pattern.retrieveCategoryNode(Pattern.java:2151)
at java.util.regex.Pattern.family(Pattern.java:2123)
at java.util.regex.Pattern.sequence(Pattern.java:1559)
at java.util.regex.Pattern.expr(Pattern.java:1506)
at java.util.regex.Pattern.compile(Pattern.java:1274)
at java.util.regex.Pattern.<init>(Pattern.java:1030)
at java.util.regex.Pattern.compile(Pattern.java:777)
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.Pattern;
public class PatternPiPf {
public static void main(String[] ignore) {
System.out.println(Pattern.compile("\\p{Pi}[^\\p{Pf}]*\\p{Pf}")
.matcher("\u201Cquoted text\u201D").matches());
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
import java.util.regex.Pattern;
Use data from http://www.unicode.org/Public/UNIDATA/PropList.txt
public class PatternPiPfWorkaround {
public static void main(String[] ignore) {
// System.out.println(Pattern.compile("\\p{Pi}[^\\p{Pf}]*\\p{Pf}")
// .matcher("\u201Cquoted text\u201D").matches());
System.out.println(Pattern.compile("[\u00AB\u2018\u201B\u201C\u201F\u2039][^\u00BB\u2019\u201D\u203A]*[\u00BB\u2019\u201D\u203A]")
.matcher("\u201Cquoted text\u201D").matches());
}
}
(Review ID: 182377)
======================================================================
###@###.### 10/12/04 02:42 GMT
FULL PRODUCT VERSION :
java version "1.4.1_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_02-b06)
Java HotSpot(TM) Client VM (build 1.4.1_02-b06, mixed mode)
FULL OS VERSION :
Microsoft Windows 2000 [Version 5.00.2195] (sp2)
A DESCRIPTION OF THE PROBLEM :
JavaDoc for java.util.Pattern says
" The supported blocks and categories are those of The Unicode Standard, Version 3.0. ... The category names are those defined in table 4-5 of the Standard (p. 88), both normative and informative."
http://java.sun.com/j2se/1.4.1/docs/api/java/util/regex/Pattern.html#ubc
The categories listed in this table include
Pi = Punctuation, initial quote
Pf = Punctuation, final quote
http://www.unicode.org/book/ch04.pdf (page 88)
These are used to identify quotation marks as listed in
http://www.unicode.org/Public/UNIDATA/PropList.txt
However, categories Pi and Pf are not supported by java.util.Pattern.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Pattern.compile("\\p{Pi}[^\\p{Pf}]*\\p{Pf}").matcher("\u201Cquoted text\u201D").matches()
java.util.regex.PatternSyntaxException: Unknown character category {Pi} near index 5
\p{Pi}[^\p{Pf}]*\p{Pf}
^
ERROR MESSAGES/STACK TRACES THAT OCCUR :
java.util.regex.PatternSyntaxException: Unknown character category {Pi} near index 5
\p{Pi}[^\p{Pf}]*\p{Pf}
^
at java.util.regex.Pattern.error(Pattern.java:1489)
at java.util.regex.Pattern.familyError(Pattern.java:2160)
at java.util.regex.Pattern.retrieveCategoryNode(Pattern.java:2151)
at java.util.regex.Pattern.family(Pattern.java:2123)
at java.util.regex.Pattern.sequence(Pattern.java:1559)
at java.util.regex.Pattern.expr(Pattern.java:1506)
at java.util.regex.Pattern.compile(Pattern.java:1274)
at java.util.regex.Pattern.<init>(Pattern.java:1030)
at java.util.regex.Pattern.compile(Pattern.java:777)
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.Pattern;
public class PatternPiPf {
public static void main(String[] ignore) {
System.out.println(Pattern.compile("\\p{Pi}[^\\p{Pf}]*\\p{Pf}")
.matcher("\u201Cquoted text\u201D").matches());
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
import java.util.regex.Pattern;
Use data from http://www.unicode.org/Public/UNIDATA/PropList.txt
public class PatternPiPfWorkaround {
public static void main(String[] ignore) {
// System.out.println(Pattern.compile("\\p{Pi}[^\\p{Pf}]*\\p{Pf}")
// .matcher("\u201Cquoted text\u201D").matches());
System.out.println(Pattern.compile("[\u00AB\u2018\u201B\u201C\u201F\u2039][^\u00BB\u2019\u201D\u203A]*[\u00BB\u2019\u201D\u203A]")
.matcher("\u201Cquoted text\u201D").matches());
}
}
(Review ID: 182377)
======================================================================
###@###.### 10/12/04 02:42 GMT
- duplicates
-
JDK-5110268 Java regular expressions need to support \p{LC}, \p{Pi} \p{Pf}
-
- Closed
-