-
Enhancement
-
Resolution: Cannot Reproduce
-
P3
-
None
-
6
-
x86
-
windows_xp
A DESCRIPTION OF THE REQUEST :
In working with some large source input (3MB), I noticed that the Matcher find() function takes a significant amount of time to find the very first match. In order to test this, I wrote a small program to test how long it takes to do so.
JUSTIFICATION :
find() should quickly find the very next match. It is behaving like it is finding all possible matches before returning the information on the first one. That makes it unusable for parsing large amounts of data.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
matcher.find() would locate the very first/next match and return immediately.
ACTUAL -
matcher.find() takes longer to locate the first match on larger files, suggesting that it is finding all matches before returning information on the first one.
---------- BEGIN SOURCE ----------
import java.util.*;
import java.util.regex.*;
public class RegexTimer
{
protected static StringBuilder sb;
protected static int prevLines = 0;
public static void main(String[] args)
{
int numLines = 1;
for(int i=0; i<20; i++)
{
new RegexTimer(numLines);
numLines = numLines * 10;
}
}
public RegexTimer(int lines)
{
if(prevLines == 0)
sb = new StringBuilder();
for(int i=prevLines; i<lines; i++)
sb.append(""+(i+1));
Pattern pattern = Pattern.compile("([\\d]+)");
Matcher matcher = pattern.matcher(sb);
boolean firstTime = true;
long first=0;
long start = System.nanoTime();
while(matcher.find())
{
if(firstTime)
{
first = System.nanoTime();
firstTime = false;
}
}
long end = System.nanoTime();
System.out.format(
"%d lines of input: %d ns to find first match, %d ns to find all matches, avg %d ns\r\n",
lines,
(first-start),
(end-start),
((end-start)/lines)
);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Don't use regex for large source input - parse by hand instead.
###@###.### 2005-06-30 08:08:19 GMT
In working with some large source input (3MB), I noticed that the Matcher find() function takes a significant amount of time to find the very first match. In order to test this, I wrote a small program to test how long it takes to do so.
JUSTIFICATION :
find() should quickly find the very next match. It is behaving like it is finding all possible matches before returning the information on the first one. That makes it unusable for parsing large amounts of data.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
matcher.find() would locate the very first/next match and return immediately.
ACTUAL -
matcher.find() takes longer to locate the first match on larger files, suggesting that it is finding all matches before returning information on the first one.
---------- BEGIN SOURCE ----------
import java.util.*;
import java.util.regex.*;
public class RegexTimer
{
protected static StringBuilder sb;
protected static int prevLines = 0;
public static void main(String[] args)
{
int numLines = 1;
for(int i=0; i<20; i++)
{
new RegexTimer(numLines);
numLines = numLines * 10;
}
}
public RegexTimer(int lines)
{
if(prevLines == 0)
sb = new StringBuilder();
for(int i=prevLines; i<lines; i++)
sb.append(""+(i+1));
Pattern pattern = Pattern.compile("([\\d]+)");
Matcher matcher = pattern.matcher(sb);
boolean firstTime = true;
long first=0;
long start = System.nanoTime();
while(matcher.find())
{
if(firstTime)
{
first = System.nanoTime();
firstTime = false;
}
}
long end = System.nanoTime();
System.out.format(
"%d lines of input: %d ns to find first match, %d ns to find all matches, avg %d ns\r\n",
lines,
(first-start),
(end-start),
((end-start)/lines)
);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Don't use regex for large source input - parse by hand instead.
###@###.### 2005-06-30 08:08:19 GMT