Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8235812

Unicode linebreak with quantifier does not match valid input

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P3 P3
    • 15
    • None
    • core-libs
    • None

      The char class \R is defined to be \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029].

      Therefore, the regex \\R{2} must match the sequence \r\n (first \R matches \r, and the second \R matches \n).
      In fact, it does not.

      import java.util.regex.*;

      public class RR {
          public static void main(String[] args) throws Throwable {
              Pattern p = Pattern.compile("\\R{2}");
              System.out.println(Boolean.toString(p.matcher("\r\r").matches()));
              System.out.println(Boolean.toString(p.matcher("\r\n").matches()));
              System.out.println(Boolean.toString(p.matcher("\n\r").matches()));
              System.out.println(Boolean.toString(p.matcher("\n\n").matches()));
              System.out.println(Boolean.toString(p.matcher("\r\n\r").matches()));
              System.out.println(Boolean.toString(p.matcher("\r\r\n").matches()));
              System.out.println(Boolean.toString(p.matcher("\r\n\n").matches()));
              System.out.println(Boolean.toString(p.matcher("\n\r\n").matches()));
              System.out.println(Boolean.toString(p.matcher("\r\n\r\n").matches()));
          }
      }

      prints the following (expected all 9 results be 'true'):

      true
      false
      true
      true
      true
      true
      true
      true
      true

            igerasim Ivan Gerasimov
            igerasim Ivan Gerasimov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: