-
Bug
-
Resolution: Not an Issue
-
P4
-
None
-
8u20
-
generic
-
generic
A DESCRIPTION OF THE PROBLEM :
The Pattern class's description of $ states: "... By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence." It says something similar further down in the description of the MULTILINE mode constant.
However, $ matches just before a terminating linefeed without multiline mode, as demonstrated by the following snippet:
String linefeed = String.valueOf((char)10); // a.k.a. "\n" if the bug form no longer mangles that
System.out.println(java.util.regex.Pattern.compile("x$").matcher("x" + linefeed).find());
The output is 'true'. $ matches even though it is not "at the end of the entire input sequence".
It is only linefeeds for which $ does this. This outputs 'false', as expected:
System.out.println(java.util.regex.Pattern.compile("x$").matcher("xy").find());
Therefore, it clearly *is* treating line terminators specially, despite the documentation being adamant that it doesn't do that except when in multiline mode.
For a correct description of $, see the justification comment for the rejection of my previous report on this issue, which mysteriously explains why I was actually correct: https://bugs.openjdk.java.net/browse/JDK-8049849 . The evaluator says that when not in multiline mode, $ matches:
"(1) the position just before the last line terminator character,
(2) or at the end of the input sequence, if the line terminator is not present at the end of the input sequence"
That description is correct, but it is patently NOT what the current documentation says. The documentation describes case 2 and it explicitly excludes case 1. The documentation is wrong.
URL OF FAULTY DOCUMENTATION :
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
The Pattern class's description of $ states: "... By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence." It says something similar further down in the description of the MULTILINE mode constant.
However, $ matches just before a terminating linefeed without multiline mode, as demonstrated by the following snippet:
String linefeed = String.valueOf((char)10); // a.k.a. "\n" if the bug form no longer mangles that
System.out.println(java.util.regex.Pattern.compile("x$").matcher("x" + linefeed).find());
The output is 'true'. $ matches even though it is not "at the end of the entire input sequence".
It is only linefeeds for which $ does this. This outputs 'false', as expected:
System.out.println(java.util.regex.Pattern.compile("x$").matcher("xy").find());
Therefore, it clearly *is* treating line terminators specially, despite the documentation being adamant that it doesn't do that except when in multiline mode.
For a correct description of $, see the justification comment for the rejection of my previous report on this issue, which mysteriously explains why I was actually correct: https://bugs.openjdk.java.net/browse/JDK-8049849 . The evaluator says that when not in multiline mode, $ matches:
"(1) the position just before the last line terminator character,
(2) or at the end of the input sequence, if the line terminator is not present at the end of the input sequence"
That description is correct, but it is patently NOT what the current documentation says. The documentation describes case 2 and it explicitly excludes case 1. The documentation is wrong.
URL OF FAULTY DOCUMENTATION :
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
- relates to
-
JDK-8059325 The documentation of regex $ is still wrong
-
- Closed
-