-
Bug
-
Resolution: Fixed
-
P4
-
1.4.2
-
b65
-
x86
-
solaris_2.5.1, windows_xp
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8057138 | 5.0u81 | Dmeetry Degrave | P4 | Resolved | Fixed | b01 |
Name: jl125535 Date: 03/15/2004
FULL PRODUCT VERSION :
java version "1.5.0-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b32c)
Java HotSpot(TM) Client VM (build 1.5.0-beta-b32c, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]
A DESCRIPTION OF THE PROBLEM :
This problem has been present since regex support was introduced
in version 1.4.0, and may have led some users to conclude that
the java.util.regex package is too slow. The problem, however,
is not the matching speed, but the time it takes to *compile* a
pattern that contains many optional groups.
The accompanying sample code measures how long it takes to
compile several patterns, each containing twenty-six optional
groups. The first pattern, consisting of capturing groups with
the default greedy question mark, takes 13.3 seconds to compile
on my machine (Athlon XP 1700+ w/512M RAM, running WinXP Home).
In the next pattern, I made the question marks lazy - "(x)??" -
and it takes even longer to compile, about 15 seconds. But when
I make the question marks possessive - "(x)?+" - the pattern
compiles in no time at all.
The fourth pattern shows that changing to non-capturing groups
makes things slightly worse; that one takes 14.5 seconds to
compile on my machine. The fifth pattern, however, is the most
interesting; simply adding the '^' start anchor to the beginning
of the pattern brings the compilation time down to nothing.
Anchoring only at the end, as in the sixth pattern, actually
increases the compilation time 16.4 seconds), but a pattern
that's anchored at both ends compiles instantly.
I realize this bug is not very likely to affect the average user, so I
don't expect it to be assigned a high priority. It is a bug, though,
and should be fixed eventually; it shouldn't be possible to write a
regex that takes fifteen seconds to compile. In the meantime, you
could add a section to the Pattern class doc containing tips to help
speed up pattern compilation and/or matching. The first two of my
suggested workarounds are good general advice anyway, the third one
could be generalized to "use possessive closures or independent groups
whenever you can", and the fourth could be listed as applying only to
this edge case.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.*;
public class Test
{
public static void main(String[] args)
{
String[] regexes = {
"(a)?(b)?(c)?(d)?(e)?(f)?(g)?(h)?(i)?(j)?(k)?(l)?(m)?" +
"(n)?(o)?(p)?(q)?(r)?(s)?(t)?(u)?(v)?(w)?(x)?(y)?(z)?",
"(a)??(b)??(c)??(d)??(e)??(f)??(g)??(h)??(i)??(j)??" +
"(k)??(l)??(m)??(n)??(o)??(p)??(q)??(r)??(s)??(t)??" +
"(u)??(v)??(w)??(x)??(y)??(z)??",
"(a)?+(b)?+(c)?+(d)?+(e)?+(f)?+(g)?+(h)?+(i)?+(j)?+" +
"(k)?+(l)?+(m)?+(n)?+(o)?+(p)?+(q)?+(r)?+(s)?+(t)?+" +
"(u)?+(v)?+(w)?+(x)?+(y)?+(z)?+",
"(?:a)?(?:b)?(?:c)?(?:d)?(?:e)?(?:f)?(?:g)?(?:h)?(?:i)?" +
"(?:j)?(?:k)?(?:l)?(?:m)?(?:n)?(?:o)?(?:p)?(?:q)?(?:r)?" +
"(?:s)?(?:t)?(?:u)?(?:v)?(?:w)?(?:x)?(?:y)?(?:z)?",
"^(a)?(b)?(c)?(d)?(e)?(f)?(g)?(h)?(i)?(j)?(k)?(l)?(m)?" +
"(n)?(o)?(p)?(q)?(r)?(s)?(t)?(u)?(v)?(w)?(x)?(y)?(z)?",
"(a)?(b)?(c)?(d)?(e)?(f)?(g)?(h)?(i)?(j)?(k)?(l)?(m)?" +
"(n)?(o)?(p)?(q)?(r)?(s)?(t)?(u)?(v)?(w)?(x)?(y)?(z)?$",
"^(?:a)?(?:b)?(?:c)?(?:d)?(?:e)?(?:f)?(?:g)?(?:h)?(?:i)?" +
"(?:j)?(?:k)?(?:l)?(?:m)?(?:n)?(?:o)?(?:p)?(?:q)?(?:r)?" +
"(?:s)?(?:t)?(?:u)?(?:v)?(?:w)?(?:x)?(?:y)?(?:z)?$"
};
for (int i = 0; i < regexes.length; i++)
{
long startTime = System.currentTimeMillis();
Pattern p = Pattern.compile(regexes[i]);
long msec = (System.currentTimeMillis() - startTime);
System.out.println("Pattern " + i + " compiles in " +
msec + " milliseconds");
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
1) Don't compile patterns inside loops if you don't have to.
2) Always anchor patterns at both ends if appropriate, even if
you're using the matches() method.
3) Use the possesive form of the '?' closure if you can.
4) Avoid patterns with many optional groups in them.
(Incident Review ID: 243126)
======================================================================
###@###.### 11/1/04 20:33 GMT
FULL PRODUCT VERSION :
java version "1.5.0-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b32c)
Java HotSpot(TM) Client VM (build 1.5.0-beta-b32c, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]
A DESCRIPTION OF THE PROBLEM :
This problem has been present since regex support was introduced
in version 1.4.0, and may have led some users to conclude that
the java.util.regex package is too slow. The problem, however,
is not the matching speed, but the time it takes to *compile* a
pattern that contains many optional groups.
The accompanying sample code measures how long it takes to
compile several patterns, each containing twenty-six optional
groups. The first pattern, consisting of capturing groups with
the default greedy question mark, takes 13.3 seconds to compile
on my machine (Athlon XP 1700+ w/512M RAM, running WinXP Home).
In the next pattern, I made the question marks lazy - "(x)??" -
and it takes even longer to compile, about 15 seconds. But when
I make the question marks possessive - "(x)?+" - the pattern
compiles in no time at all.
The fourth pattern shows that changing to non-capturing groups
makes things slightly worse; that one takes 14.5 seconds to
compile on my machine. The fifth pattern, however, is the most
interesting; simply adding the '^' start anchor to the beginning
of the pattern brings the compilation time down to nothing.
Anchoring only at the end, as in the sixth pattern, actually
increases the compilation time 16.4 seconds), but a pattern
that's anchored at both ends compiles instantly.
I realize this bug is not very likely to affect the average user, so I
don't expect it to be assigned a high priority. It is a bug, though,
and should be fixed eventually; it shouldn't be possible to write a
regex that takes fifteen seconds to compile. In the meantime, you
could add a section to the Pattern class doc containing tips to help
speed up pattern compilation and/or matching. The first two of my
suggested workarounds are good general advice anyway, the third one
could be generalized to "use possessive closures or independent groups
whenever you can", and the fourth could be listed as applying only to
this edge case.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.*;
public class Test
{
public static void main(String[] args)
{
String[] regexes = {
"(a)?(b)?(c)?(d)?(e)?(f)?(g)?(h)?(i)?(j)?(k)?(l)?(m)?" +
"(n)?(o)?(p)?(q)?(r)?(s)?(t)?(u)?(v)?(w)?(x)?(y)?(z)?",
"(a)??(b)??(c)??(d)??(e)??(f)??(g)??(h)??(i)??(j)??" +
"(k)??(l)??(m)??(n)??(o)??(p)??(q)??(r)??(s)??(t)??" +
"(u)??(v)??(w)??(x)??(y)??(z)??",
"(a)?+(b)?+(c)?+(d)?+(e)?+(f)?+(g)?+(h)?+(i)?+(j)?+" +
"(k)?+(l)?+(m)?+(n)?+(o)?+(p)?+(q)?+(r)?+(s)?+(t)?+" +
"(u)?+(v)?+(w)?+(x)?+(y)?+(z)?+",
"(?:a)?(?:b)?(?:c)?(?:d)?(?:e)?(?:f)?(?:g)?(?:h)?(?:i)?" +
"(?:j)?(?:k)?(?:l)?(?:m)?(?:n)?(?:o)?(?:p)?(?:q)?(?:r)?" +
"(?:s)?(?:t)?(?:u)?(?:v)?(?:w)?(?:x)?(?:y)?(?:z)?",
"^(a)?(b)?(c)?(d)?(e)?(f)?(g)?(h)?(i)?(j)?(k)?(l)?(m)?" +
"(n)?(o)?(p)?(q)?(r)?(s)?(t)?(u)?(v)?(w)?(x)?(y)?(z)?",
"(a)?(b)?(c)?(d)?(e)?(f)?(g)?(h)?(i)?(j)?(k)?(l)?(m)?" +
"(n)?(o)?(p)?(q)?(r)?(s)?(t)?(u)?(v)?(w)?(x)?(y)?(z)?$",
"^(?:a)?(?:b)?(?:c)?(?:d)?(?:e)?(?:f)?(?:g)?(?:h)?(?:i)?" +
"(?:j)?(?:k)?(?:l)?(?:m)?(?:n)?(?:o)?(?:p)?(?:q)?(?:r)?" +
"(?:s)?(?:t)?(?:u)?(?:v)?(?:w)?(?:x)?(?:y)?(?:z)?$"
};
for (int i = 0; i < regexes.length; i++)
{
long startTime = System.currentTimeMillis();
Pattern p = Pattern.compile(regexes[i]);
long msec = (System.currentTimeMillis() - startTime);
System.out.println("Pattern " + i + " compiles in " +
msec + " milliseconds");
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
1) Don't compile patterns inside loops if you don't have to.
2) Always anchor patterns at both ends if appropriate, even if
you're using the matches() method.
3) Use the possesive form of the '?' closure if you can.
4) Avoid patterns with many optional groups in them.
(Incident Review ID: 243126)
======================================================================
###@###.### 11/1/04 20:33 GMT
- backported by
-
JDK-8057138 java.util.regex.Pattern: Optional groups take too long to compile
-
- Resolved
-
- duplicates
-
JDK-5078293 Pattern.compile hangs up on a particular regexp template
-
- Closed
-
- relates to
-
JDK-6271399 Pattern matching using regex takes a long time
-
- Closed
-
-
JDK-6342544 Compilation Time of java.util.regex.Pattern takes too long
-
- Resolved
-