-
Bug
-
Resolution: Fixed
-
P4
-
6u26
-
b119
-
x86
-
linux
FULL PRODUCT VERSION :
$java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Client VM (build 20.1-b02, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
$uname -a
Linux lc_rh_8 2.4.27 #3 SMP Fri Oct 31 16:51:51 GMT 2008 i686 i686 i386 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
This problem happens only when the unicode canonical equivalent match enabled (Pattern.CANON_EQ).
It seems to me that the pattern string normalization is not handled correctly. When I add a capturing group (a pair of parentheses) to enclose one unicode string (one base character and followed by the two NON_SPACING_MARK characters), the right parenthesis is somehow treated as the NON_SPACING_MARK character as well.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Use the source code I pasted below:
javac Test.java
java Test abcd
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
it should print "false"
ACTUAL -
an exception was thrown out:
Exception in thread "main" java.util.regex.PatternSyntaxException: Unmatched closing ')' near index 14
a((?:A??)|??)|?)?|A?)?|?)?|??)|A??)|Á?)|??)|?)?|Á)?|A?)?|Á)?|Á?)|??)|?)?|A)??|A)??)
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.compile(Pattern.java:1464)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at Test.<init>(Test.java:8)
at Test.main(Test.java:19)
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.*;
public class Test {
private Pattern pattern;
public Test() {
String patternString = "a(\u0041\u0301\u0328)"; // capture group 1
pattern = Pattern.compile(patternString, Pattern.CANON_EQ); // unicode canonical equivalent match
}
boolean match(String s) {
Matcher m = pattern.matcher(s);
return m.find();
}
public static void main(String[] argv) {
if (argv.length > 0) {
boolean matched = new Test().match(argv[0]);
System.out.println(matched);
}
}
}
---------- END SOURCE ----------
$java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Client VM (build 20.1-b02, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
$uname -a
Linux lc_rh_8 2.4.27 #3 SMP Fri Oct 31 16:51:51 GMT 2008 i686 i686 i386 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
This problem happens only when the unicode canonical equivalent match enabled (Pattern.CANON_EQ).
It seems to me that the pattern string normalization is not handled correctly. When I add a capturing group (a pair of parentheses) to enclose one unicode string (one base character and followed by the two NON_SPACING_MARK characters), the right parenthesis is somehow treated as the NON_SPACING_MARK character as well.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Use the source code I pasted below:
javac Test.java
java Test abcd
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
it should print "false"
ACTUAL -
an exception was thrown out:
Exception in thread "main" java.util.regex.PatternSyntaxException: Unmatched closing ')' near index 14
a((?:A??)|??)|?)?|A?)?|?)?|??)|A??)|Á?)|??)|?)?|Á)?|A?)?|Á)?|Á?)|??)|?)?|A)??|A)??)
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.compile(Pattern.java:1464)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at Test.<init>(Test.java:8)
at Test.main(Test.java:19)
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.*;
public class Test {
private Pattern pattern;
public Test() {
String patternString = "a(\u0041\u0301\u0328)"; // capture group 1
pattern = Pattern.compile(patternString, Pattern.CANON_EQ); // unicode canonical equivalent match
}
boolean match(String s) {
Matcher m = pattern.matcher(s);
return m.find();
}
public static void main(String[] argv) {
if (argv.length > 0) {
boolean matched = new Test().match(argv[0]);
System.out.println(matched);
}
}
}
---------- END SOURCE ----------
- relates to
-
JDK-6728861 ExceptionInInitializerError is caught when the pattern has precomposed character
-
- Resolved
-