-
Bug
-
Resolution: Fixed
-
P4
-
6u21
-
b19
-
x86
-
linux
-
Verified
FULL PRODUCT VERSION :
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Server VM (build 16.3-b01, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux jantunes-ubuntu 2.6.32-25-generic #44-Ubuntu SMP Fri Sep 17 20:26:08 UTC 2010 i686 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
Unicode characters are represented as \\+number. For instance, one could write:
Pattern p = Pattern.compile("\\011some text\\012");
Matcher m = p.matcher("\tsome text\n");
System.out.println(m.find()); // yields "true"
However, if we want to match a string with a digit next to the unicode character, it doesn't match (whether we "quote" the regular expression or not). Note the "1" next to the tab character (unicode 011).
Pattern p = Pattern.compile("\\011\\Q1some text\\E\\012");
Matcher m = p.matcher("\t1some text\n");
System.out.println(m.find()); // yields "false"
This happens because Pattern accepts either \\0011 or \\011 for the same character. From the javadoc:
\0nn The character with octal value 0nn (0 <= n <= 7)
\0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Pattern p = Pattern.compile("\\011\\Q1some text\\E\\012");
Matcher m = p.matcher("\t1some text\n");
System.out.println(m.find()); // yields "false"
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
Always use the largest representation for using unicode codes in regular expressions.
Such as:
Pattern p = Pattern.compile("\\0011\\Q1some text\\E\\0012");
Matcher m = p.matcher("\t1some text\n");
System.out.println(m.find()); // now yields "true" as supposed to
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Server VM (build 16.3-b01, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux jantunes-ubuntu 2.6.32-25-generic #44-Ubuntu SMP Fri Sep 17 20:26:08 UTC 2010 i686 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
Unicode characters are represented as \\+number. For instance, one could write:
Pattern p = Pattern.compile("\\011some text\\012");
Matcher m = p.matcher("\tsome text\n");
System.out.println(m.find()); // yields "true"
However, if we want to match a string with a digit next to the unicode character, it doesn't match (whether we "quote" the regular expression or not). Note the "1" next to the tab character (unicode 011).
Pattern p = Pattern.compile("\\011\\Q1some text\\E\\012");
Matcher m = p.matcher("\t1some text\n");
System.out.println(m.find()); // yields "false"
This happens because Pattern accepts either \\0011 or \\011 for the same character. From the javadoc:
\0nn The character with octal value 0nn (0 <= n <= 7)
\0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Pattern p = Pattern.compile("\\011\\Q1some text\\E\\012");
Matcher m = p.matcher("\t1some text\n");
System.out.println(m.find()); // yields "false"
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
Always use the largest representation for using unicode codes in regular expressions.
Such as:
Pattern p = Pattern.compile("\\0011\\Q1some text\\E\\0012");
Matcher m = p.matcher("\t1some text\n");
System.out.println(m.find()); // now yields "true" as supposed to
- relates to
-
JDK-8140638 regex pattern compilation fails for patterns containing \Q and \E
-
- Closed
-