Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8218146

$ matches before end of line, even without MULTILINE mode

XMLWordPrintable

    • Cause Known
    • generic
    • generic

      ADDITIONAL SYSTEM INFORMATION :
      OpenJDK 8
      Oracle JDK 8
      Oracle JDK 11

      A DESCRIPTION OF THE PROBLEM :
      I find the documentation for Pattern confusing with respect to the effect of $.
      When I write "the docs" I mean https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html, which seem to match Java8 and Java7.

      The default behavior of $ is described in the docs as:

      (1) $: The end of a line, which is later clarified to:
      (2) By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence.

      I have tested this description of $ on OpenJDK 8, OracleJDK 8, and OracleJDK 11. All three agree, but seem inconsistent with the documented behavior.

      In particular, I am creating a Pattern with no flags:
        Pattern p = Pattern.compile("$\\s+", 0);

      I then match it using p.matcher() against the string "x\r" -- the letter x, then a carriage return which is defined as a line termination character in the docs.

      The match is reported as a success. A success is consistent with the MULTILINE behavior of $, in which it matches either (1) the end of the input, or (2) the end of a line (i.e. immediately before a line termination). However, as I understand the docs, a success is inconsistent in the default (non-MULTILINE) mode in which I created the Pattern. In the default mode it should only match the end of the input, and so a Pattern like "$\\s+" should be impossible to satisfy.

      I am not the first person to be confused by this behavior. See the following OpenJDK bugs, which all describe this behavior. I do not understand the explanation that has been provided thus far.

      https://bugs.openjdk.java.net/browse/JDK-8059325
      https://bugs.openjdk.java.net/browse/JDK-8058923
      https://bugs.openjdk.java.net/browse/JDK-8049849
      https://bugs.openjdk.java.net/browse/JDK-8043255


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      See source code. Run javac, then "java ConfusingDollar", and it will print "Default: Matched" instead of "Default: Did not match".

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      On the string "x\r", the Pattern /$\s+/ should not match in default mode. It should only match in MULTILINE mode.
      ACTUAL -
      It matches in default mode.

      ---------- BEGIN SOURCE ----------
      import java.util.regex.Pattern;
      import java.util.regex.Matcher;

      public class ConfusingDollar {
        public static void main(String[] args)
        {
      Pattern p_def = Pattern.compile("$\\s+", 0);
      Matcher m_def = p_def.matcher("x\r");

      if (m_def.find()) {
      System.out.println("Default: Matched");
      } else {
      System.out.println("Default: Did not match");
      }

      Pattern p_mult = Pattern.compile("$\\s+", Pattern.MULTILINE);
      Matcher m_mult = p_mult.matcher("x\r");

      if (m_mult.find()) {
      System.out.println("MULTILINE: Matched");
      } else {
      System.out.println("MULTILINE: Did not match");
      }
      }
      }

      ---------- END SOURCE ----------

      FREQUENCY : always


            rgiulietti Raffaello Giulietti
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: