Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8027747

Regex: odd behavior of capturing group under possessive quantifier

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • 7u9
    • core-libs

      FULL PRODUCT VERSION :
      java version " 1.7.0_09 "
      Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
      Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      Mac OS X 10.7.5

      A DESCRIPTION OF THE PROBLEM :
      When matching against a regular expression that has a capturing group inside a possessive quantifier, the group sometimes shows captured input, even if it was not part of the final match.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      See source code below.

      (1) compile pattern " ([abc]+?)(b)?+(d) "
      (2) match against " abcd "
      (3) check the value of matcher.group(2)

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      For this regex, with Matcher " m " , I would expect m.group(2) to be null. More generally, since the three groups do not overlap, I'd expect " m.group(1) + m.group(2) + m.group(3) " to coincide with the input stream (if m.group(2) is not null).
      ACTUAL -
      Instead, m.group(2)= " b " , even though m.group(1)= " abc " and m.group(3)= " d " .

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.util.regex.*;

      public class TestRegex {
          public static void main(String[] args) {
              Pattern pattern = Pattern.compile( " ([abc]+?)(b)?+(d) " );
              Matcher m = pattern.matcher( " abcd " );
              if(m.matches()) {
                  System.out.println(m.group(0));
                  System.out.println(m.group(1) + " | " + m.group(2) + " | " + m.group(3));
              } else {
                  System.out.println( " does not match " );
              }
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Don't use possessive quantifiers around capturing groups.

            sherman Xueming Shen
            igerasim Ivan Gerasimov
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: