-
Bug
-
Resolution: Unresolved
-
P4
-
None
-
7
-
x86
-
linux
FULL PRODUCT VERSION :
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b06)
Java HotSpot(TM) Client VM (build 1.7.0-ea-b06, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Linux helium 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
Pattern.compile("$").matcher("a\nb\nc\n") matches twice instead of once.
http://elliotth.blogspot.com/2007/01/what-do-anchors-and-mean-in-regular.html
the first match is the final line terminator. the second match is the end-of-input.
in MULTILINE mode this is unfortunate (because it's not Perl-compatible and should be listed in the incompatibilities with Perl 5 in the documentation), but it's understandable because of the "or" in the definition of what MULTILINE causes $ to match.
but in non-MULTILINE mode, this is incorrect (in that i don't see how it's specified by the documentation).
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
run the supplied test case.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.*;
public class test {
public static void main(String[] args) {
Pattern p = Pattern.compile("$");
Matcher m = p.matcher("a\nb\nc\nhello\nworld\n");
int count = 0;
while (m.find()) {
++count;
}
System.err.println(count);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
i would have suggested using \Z, but that's broken too ;-)
Copied from http://bugs.openjdk.java.net/show_bug.cgi?id=100084#c0
Description From ###@###.### 2009-07-09 01:40:09 PDT
Created an attachment (id=99) [details]
contains the exported diff and a jtreg testcase
sunbug=6520207
Pattern.compile("$").matcher("a\nb\nc\n") matches twice instead of once.
--------------------------------------------
Adding a simple check in the Pattern$Dollar class to avoid matching without any
content.
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b06)
Java HotSpot(TM) Client VM (build 1.7.0-ea-b06, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Linux helium 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
Pattern.compile("$").matcher("a\nb\nc\n") matches twice instead of once.
http://elliotth.blogspot.com/2007/01/what-do-anchors-and-mean-in-regular.html
the first match is the final line terminator. the second match is the end-of-input.
in MULTILINE mode this is unfortunate (because it's not Perl-compatible and should be listed in the incompatibilities with Perl 5 in the documentation), but it's understandable because of the "or" in the definition of what MULTILINE causes $ to match.
but in non-MULTILINE mode, this is incorrect (in that i don't see how it's specified by the documentation).
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
run the supplied test case.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.*;
public class test {
public static void main(String[] args) {
Pattern p = Pattern.compile("$");
Matcher m = p.matcher("a\nb\nc\nhello\nworld\n");
int count = 0;
while (m.find()) {
++count;
}
System.err.println(count);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
i would have suggested using \Z, but that's broken too ;-)
Copied from http://bugs.openjdk.java.net/show_bug.cgi?id=100084#c0
Description From ###@###.### 2009-07-09 01:40:09 PDT
Created an attachment (id=99) [details]
contains the exported diff and a jtreg testcase
sunbug=6520207
Pattern.compile("$").matcher("a\nb\nc\n") matches twice instead of once.
--------------------------------------------
Adding a simple check in the Pattern$Dollar class to avoid matching without any
content.