-
Bug
-
Resolution: Fixed
-
P4
-
1.4.0
-
beta2
-
generic
-
generic
-
Verified
Name: bsC130419 Date: 06/20/2001
java version "1.4.0-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta-b65)
Java HotSpot(TM) Client VM (build 1.4.0-beta-b65, mixed mode)
The "\A" operator is supposed to match the very beginning
of the input text, but when you use it in conjunction with
the find() method, it actually matches the position where
the previous match ended, or the beginning of the text if
there was no previous match. (In fact, it works exactly
like Perl's "\G" operator.) The first example below should
only print "abc", but it goes on to match three more letters
on the next iteration. The "^" operator suffers from the
same problem--the second example should only print "abc" and
"jkl".
In the source code, I see that the Begin and Caret node
classes compare the current position to matcher.from and
matcher.first, respectively, to determine whether it's the
beginning of the input. But when the find() method is
called, it sets both those values to the end position of the
last match, plus one. Shouldn't they be comparing to zero
instead?
----------------------- sample code ---------------------------
import java.util.regex.*;
public class AnchorTest
{
public static void main(String[] argv)
{
Pattern p = Pattern.compile("\\A{alpha}{3}");
Matcher m = p.matcher("abcdef-ghi\njklmno");
System.out.println("\n" + p.pattern());
while (m.find()) {
System.out.println(m.group());
}
p = Pattern.compile("^{alpha}{3}", Pattern.MULTILINE);
m = p.matcher("abcdef-ghi\njklmno");
System.out.println("\n" + p.pattern() + " (multiline mode)");
while (m.find()) {
System.out.println(m.group());
}
}
}
------------------------- output -----------------------------
$ java AnchorTest
\A{alpha}{3}
abc
def
^{alpha}{3} (multiline mode)
abc
def
jkl
mno
(Review ID: 126953)
======================================================================