-
Bug
-
Resolution: Fixed
-
P3
-
5.0, 6u34, 7, 8
-
b89
-
generic
-
generic
-
Verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8223327 | 7u241 | Ivan Gerasimov | P3 | Resolved | Fixed | b01 |
SYNOPSIS
--------
StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters
OPERATING SYSTEMS
-----------------
All
FULL JDK VERSIONS
-----------------
All (Since JDK 1.5.0)
PROBLEM DESCRIPTION
-------------------
When the Match.find() is called for an input String with surrogate characters in the string, it throws a StringIndexOutofBoundsException under the following circumstances:
1. When a regex pattern results in a call to the GroupCurly.match0() method
2. When the surrogate pair in the String is after an index > 4+ minimum expected length of the input string for the pattern
3. When the pattern does not match the input string
REPRODUCTION INSTRUCTIONS
-------------------------
Simply compile and run the attached test case.
Observed behaviour (this specific trace is from 7u9):
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.charAt(String.java:658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4360)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4354)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:4304)
at java.util.regex.Pattern$SliceI.match(Pattern.java:3895)
at java.util.regex.Pattern$Start.match(Pattern.java:3408)
at java.util.regex.Matcher.search(Matcher.java:1199)
at java.util.regex.Matcher.find(Matcher.java:592)
at RegexTestCase.main(RegexTestCase.java:11)
Expected Behavior:
No Exceptions should be thrown. The pattern does not match, so Matcher.find() should return false.
TEST CASE
---------
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTestCase {
public static void main(String[] args) {
String ptrnStr = "test(.)+(@[a-zA-Z.]+)";
Pattern ptrn = Pattern.compile (ptrnStr, Pattern.CASE_INSENSITIVE);
String inputStr = "test this as \ud83d\ude0d";
Matcher matcher = ptrn.matcher(inputStr);
try {
if (matcher.find()) {
System.out.println("Found String");
} else {
System.out.println("Not found");
}
} catch (StringIndexOutOfBoundsException siob) {
System.out.println("Testcase Failed");
siob.printStackTrace();
}
}
}
WORK AROUND
----------
Catch the exception and treat is as a "false" return value.
SUGGESTED FIX
-------------
See attachment.
--------
StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters
OPERATING SYSTEMS
-----------------
All
FULL JDK VERSIONS
-----------------
All (Since JDK 1.5.0)
PROBLEM DESCRIPTION
-------------------
When the Match.find() is called for an input String with surrogate characters in the string, it throws a StringIndexOutofBoundsException under the following circumstances:
1. When a regex pattern results in a call to the GroupCurly.match0() method
2. When the surrogate pair in the String is after an index > 4+ minimum expected length of the input string for the pattern
3. When the pattern does not match the input string
REPRODUCTION INSTRUCTIONS
-------------------------
Simply compile and run the attached test case.
Observed behaviour (this specific trace is from 7u9):
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.charAt(String.java:658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4360)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4354)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:4304)
at java.util.regex.Pattern$SliceI.match(Pattern.java:3895)
at java.util.regex.Pattern$Start.match(Pattern.java:3408)
at java.util.regex.Matcher.search(Matcher.java:1199)
at java.util.regex.Matcher.find(Matcher.java:592)
at RegexTestCase.main(RegexTestCase.java:11)
Expected Behavior:
No Exceptions should be thrown. The pattern does not match, so Matcher.find() should return false.
TEST CASE
---------
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTestCase {
public static void main(String[] args) {
String ptrnStr = "test(.)+(@[a-zA-Z.]+)";
Pattern ptrn = Pattern.compile (ptrnStr, Pattern.CASE_INSENSITIVE);
String inputStr = "test this as \ud83d\ude0d";
Matcher matcher = ptrn.matcher(inputStr);
try {
if (matcher.find()) {
System.out.println("Found String");
} else {
System.out.println("Not found");
}
} catch (StringIndexOutOfBoundsException siob) {
System.out.println("Testcase Failed");
siob.printStackTrace();
}
}
}
WORK AROUND
----------
Catch the exception and treat is as a "false" return value.
SUGGESTED FIX
-------------
See attachment.
- backported by
-
JDK-8223327 StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters
- Resolved