-
Bug
-
Resolution: Not an Issue
-
P4
-
None
-
8u60, 9
-
x86_64
-
windows_7
FULL PRODUCT VERSION :
Java 7 on Windows:
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
Java 8 on Windows:
java version "1.8.0_60-ea"
Java(TM) SE Runtime Environment (build 1.8.0_60-ea-b25)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
On Ubuntu:
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Versione 6.1.7601]
Linux innovation-2 3.11.0-26-generic #45~precise1-Ubuntu SMP Tue Jul 15 04:02:35 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
We found a set of strings that, when parsed by a regular expression, will lock the execution in a loop. In details, the problem arises when the string to be parsed contains both repeated words and a '?' character.
Removing one of this conditions (no repeated text or no '?' character) makes the code run correctly.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Just execute the following code:
public static void main(String[] args) {
String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
String EMAIL_CONTACT_PATTERN = "(\\s*\"?\\s*([_A-Za-z0-9- ]+)*?\\s*\"?\\s*)?" + "<?(" + EMAIL_PATTERN + ")>?";
Pattern contactPattern = Pattern.compile(EMAIL_CONTACT_PATTERN);
Matcher matcher = contactPattern.matcher("RenNome asd renCognome asd <bas?@alice.it>");
if (matcher.matches()) {
System.out.println("Matches");
}
else {
System.out.println("Doesn't match");
}
}
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Console should print either "Matches" or "Doesn't match"
ACTUAL -
The execution will stuck on the matcher.matches() line
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExpBug {
public static void main(String[] args) {
String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
String EMAIL_CONTACT_PATTERN = "(\\s*\"?\\s*([_A-Za-z0-9- ]+)*?\\s*\"?\\s*)?" + "<?(" + EMAIL_PATTERN + ")>?";
Pattern contactPattern = Pattern.compile(EMAIL_CONTACT_PATTERN);
Matcher matcher = contactPattern.matcher("RenNome asd renCognome asd <bas?@alice.it>");
if (matcher.matches()) {
System.out.println("Matches");
}
else {
System.out.println("Doesn't match");
}
}
}
---------- END SOURCE ----------
Java 7 on Windows:
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
Java 8 on Windows:
java version "1.8.0_60-ea"
Java(TM) SE Runtime Environment (build 1.8.0_60-ea-b25)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
On Ubuntu:
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Versione 6.1.7601]
Linux innovation-2 3.11.0-26-generic #45~precise1-Ubuntu SMP Tue Jul 15 04:02:35 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
We found a set of strings that, when parsed by a regular expression, will lock the execution in a loop. In details, the problem arises when the string to be parsed contains both repeated words and a '?' character.
Removing one of this conditions (no repeated text or no '?' character) makes the code run correctly.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Just execute the following code:
public static void main(String[] args) {
String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
String EMAIL_CONTACT_PATTERN = "(\\s*\"?\\s*([_A-Za-z0-9- ]+)*?\\s*\"?\\s*)?" + "<?(" + EMAIL_PATTERN + ")>?";
Pattern contactPattern = Pattern.compile(EMAIL_CONTACT_PATTERN);
Matcher matcher = contactPattern.matcher("RenNome asd renCognome asd <bas?@alice.it>");
if (matcher.matches()) {
System.out.println("Matches");
}
else {
System.out.println("Doesn't match");
}
}
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Console should print either "Matches" or "Doesn't match"
ACTUAL -
The execution will stuck on the matcher.matches() line
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExpBug {
public static void main(String[] args) {
String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
String EMAIL_CONTACT_PATTERN = "(\\s*\"?\\s*([_A-Za-z0-9- ]+)*?\\s*\"?\\s*)?" + "<?(" + EMAIL_PATTERN + ")>?";
Pattern contactPattern = Pattern.compile(EMAIL_CONTACT_PATTERN);
Matcher matcher = contactPattern.matcher("RenNome asd renCognome asd <bas?@alice.it>");
if (matcher.matches()) {
System.out.println("Matches");
}
else {
System.out.println("Doesn't match");
}
}
}
---------- END SOURCE ----------
- duplicates
-
JDK-8154101 matches method halt with some reg expressions
-
- Closed
-
- relates to
-
JDK-8139263 Matcher#find is very slow when no match is found
-
- Closed
-
-
JDK-8140212 Slow performance of Matcher.find
-
- Closed
-