Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8281560

Matcher.hitEnd returns unexpected results in presence of CANON_EQ flag.

XMLWordPrintable

    • b01
    • 11
    • b13
    • generic
    • generic
    • Verified

      A DESCRIPTION OF THE PROBLEM :
      When matching the string `a1a1` with two regexes: `(a+|1+)` and `([a]+|[1]+)`, the results returned by `hitEnd` for the 3rd match differ. This is unexpected because the two regexes should be completely equivalent. The situation only occurs if the `CANON_EQ` flag is provided. If it is not provided, the results are again consistent.

      The results were consistent in Java 1.8. It's been tested in JDK 11, 17 and 19 that the results are inconsistent.

      REGRESSION : Last worked in version 8

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Compile the two patterns (`(a+|1+)` and `([a]+|[1]+)`) with the `CANON_EQ` option, run a matcher against input `a1a1` and compare the results returned by `hitEnd` after each match.
      Notice how for the 3rd match the results will be different for the two regexes.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Matching (a+|1+) with a1a1
      a
      hitEnd = false
      1
      hitEnd = false
      a
      hitEnd = false
      1
      hitEnd = true
      Matching ([a]+|[1]+) with a1a1
      a
      hitEnd = false
      1
      hitEnd = false
      a
      hitEnd = false
      1
      hitEnd = true

      ACTUAL -
      Matching (a+|1+) with a1a1
      a
      hitEnd = false
      1
      hitEnd = false
      a
      hitEnd = false
      1
      hitEnd = true
      Matching ([a]+|[1]+) with a1a1
      a
      hitEnd = false
      1
      hitEnd = false
      a
      hitEnd = true
      1
      hitEnd = true


      ---------- BEGIN SOURCE ----------
      import java.util.regex.Pattern;
      import java.util.regex.Matcher;

      public class regex {
      public static void domatch(String regex, String input) {
      Pattern pat = Pattern.compile(regex, Pattern.CANON_EQ);
      Matcher matcher = pat.matcher(input);

      System.out.println("Matching " + regex + " with " + input);

      while (matcher.find()) {
      System.out.println(matcher.group());
      System.out.println("hitEnd = " + matcher.hitEnd());
      }
      }

      public static void main(String[] args) {
      domatch("(a+|1+)", "a1a1");
      domatch("([a]+|[1]+)", "a1a1");
      }
      }

      ---------- END SOURCE ----------

      FREQUENCY : always


            igraves Ian Graves
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: