-
Bug
-
Resolution: Unresolved
-
P4
-
None
-
11, 12
ADDITIONAL SYSTEM INFORMATION :
> uname -a
Linux marcy 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u3 (2019-06-16) x86_64 GNU/Linux
> java -version
openjdk version "12.0.1" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 12.0.1+12)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 12.0.1+12, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
Pattern.compile generates a StringIndexOutOfBoundsException if the pattern contains a supplementary codepoint and the flags include CANON_EQ. The problem does not occur if only the \x{...} notation is used in the pattern.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Pass a String containing a valid surrogate pair to Pattern.compile, along with a flags argument that includes CANON_EQ.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Pattern should compile successfully.
ACTUAL -
Pattern.compile method throws this exception:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: index 2,length 2
at java.base/java.lang.String.checkIndex(String.java:3369)
at java.base/java.lang.String.codePointAt(String.java:736)
at java.base/java.util.regex.Pattern.normalizeSlice(Pattern.java:1517)
at java.base/java.util.regex.Pattern.normalize(Pattern.java:1475)
at java.base/java.util.regex.Pattern.compile(Pattern.java:1740)
at java.base/java.util.regex.Pattern.<init>(Pattern.java:1427)
at java.base/java.util.regex.Pattern.compile(Pattern.java:1094)
---------- BEGIN SOURCE ----------
import java.util.regex.Pattern;
public class RegexSupplementaryBugDemo {
public static void main(String[] args) {
System.out.println("Testing escaped codepoint with CANON_EQ");
Pattern.compile("\\x{1d434}", Pattern.CANON_EQ);
System.out.println("Testing codepoint without CANON_EQ");
Pattern.compile("\ud835\udc34");
System.out.println("Testing codepoint with CANON_EQ");
Pattern.compile("\ud835\udc34", Pattern.CANON_EQ);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Use \x{...} notation instead of directly embedding supplementary codepoints. However, this is not a viable option if the text is going to be passed to the Pattern.quote method.
FREQUENCY : always
> uname -a
Linux marcy 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u3 (2019-06-16) x86_64 GNU/Linux
> java -version
openjdk version "12.0.1" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 12.0.1+12)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 12.0.1+12, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
Pattern.compile generates a StringIndexOutOfBoundsException if the pattern contains a supplementary codepoint and the flags include CANON_EQ. The problem does not occur if only the \x{...} notation is used in the pattern.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Pass a String containing a valid surrogate pair to Pattern.compile, along with a flags argument that includes CANON_EQ.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Pattern should compile successfully.
ACTUAL -
Pattern.compile method throws this exception:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: index 2,length 2
at java.base/java.lang.String.checkIndex(String.java:3369)
at java.base/java.lang.String.codePointAt(String.java:736)
at java.base/java.util.regex.Pattern.normalizeSlice(Pattern.java:1517)
at java.base/java.util.regex.Pattern.normalize(Pattern.java:1475)
at java.base/java.util.regex.Pattern.compile(Pattern.java:1740)
at java.base/java.util.regex.Pattern.<init>(Pattern.java:1427)
at java.base/java.util.regex.Pattern.compile(Pattern.java:1094)
---------- BEGIN SOURCE ----------
import java.util.regex.Pattern;
public class RegexSupplementaryBugDemo {
public static void main(String[] args) {
System.out.println("Testing escaped codepoint with CANON_EQ");
Pattern.compile("\\x{1d434}", Pattern.CANON_EQ);
System.out.println("Testing codepoint without CANON_EQ");
Pattern.compile("\ud835\udc34");
System.out.println("Testing codepoint with CANON_EQ");
Pattern.compile("\ud835\udc34", Pattern.CANON_EQ);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Use \x{...} notation instead of directly embedding supplementary codepoints. However, this is not a viable option if the text is going to be passed to the Pattern.quote method.
FREQUENCY : always