Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6371680

Documentation of regular expressions for non-capturing groups is not correct.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: P5 P5
    • None
    • 5.0, 6u21
    • core-libs

      FULL PRODUCT VERSION :
      java version "1.5.0_05"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05)
      Java HotSpot(TM) Client VM (build 1.5.0_05-b05, mixed mode, sharing)

      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows XP - Home Edition - Version 2002 - Service Pack 2

      A DESCRIPTION OF THE PROBLEM :
      The API documentation for the class java.util.regex.Pattern states that "Groups beginning with (? are pure, non-capturing groups" which I believe is false and should be "(?:" instead of "(?".

      I got the following error when trying to use "(?" instead of a "(":
      java.util.regex.PatternSyntaxException: Unknown inline modifier near index 144

      I looked up the documentation for Perl, and there it stated that non-capturing groups should begin with "(?:". Since there is alot of analogy between regular expressions in Java and Perl, I tried to use "(?:" and it worked perfectly.

      So, in conclusion, or (a) the documentation is incorrect, or (b) the implementation is incorrect.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Using Java version 1.5.0_05, create a simple class as follows:

      import java.util.regex.*;
      public class Main {
          public static void main(String[] args) {
              System.out.println(Pattern.matches("(a).*", "abc"));
              System.out.println(Pattern.matches("(?:a).*", "abc"));
              System.out.println(Pattern.matches("(?a).*", "abc"));
          }
      }

      Execute the main method of this class.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      I expected to see the second statement return false (because there is no semicolon present in the string "abc").

      ACTUAL -
      The first two print statements worked ok, but the third one throws a java.util.regex.PatternSyntaxException exception.

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      Exception in thread "main" java.util.regex.PatternSyntaxException: Unknown inline modifier near index 2
      (?a).*
      at java.util.regex.Pattern.error(Pattern.java:1650)
      at java.util.regex.Pattern.group0(Pattern.java:2446)
      at java.util.regex.Pattern.sequence(Pattern.java:1715)
      at java.util.regex.Pattern.expr(Pattern.java:1687)
      at java.util.regex.Pattern.compile(Pattern.java:1397)
      at java.util.regex.Pattern.<init>(Pattern.java:1124)
      at java.util.regex.Pattern.compile(Pattern.java:817)
      at java.util.regex.Pattern.matches(Pattern.java:919)
      at whatif.Main.main(Main.java:32)

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.util.regex.*;
      public class Main {
          public static void main(String[] args) {
              System.out.println(Pattern.matches("(a).*", "abc"));
              System.out.println(Pattern.matches("(?:a).*", "abc"));
              System.out.println(Pattern.matches("(?a).*", "abc"));
          }
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Use "(?:" instead of the documented "(?" for non-capturing groups.

            sherman Xueming Shen
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: