Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8262279

Regex intersection character class applies to whole enclosing character class

XMLWordPrintable

      A DESCRIPTION OF THE PROBLEM :
      The documentation of java.util.regex.Pattern suggests that a nested intersection character class applies to an "operand":
      > The intersection operator denotes a class that contains every character that is in both of its operand classes.

      However, it appears that is not actually true; instead the intersection applies to the whole enclosing character class instead of the "operand" immediately in front of it.

      For example pattern "[a-z&&[b-e]A-Z]" (respectively "[A-Za-z&&[b-e]]") should match:
      - A-Z
      - OR a-z INTERSECTING b-e

      Therefore for example 'A' should be allowed. However, because the intersection appears to apply to the enclosing character class as a whole, none of the characters defined by `A-Z` are allowed (because the intersection does not cover them):
      ```
      for (char c = 'A'; c <= 'z'; c++) {
          System.out.println(Character.toString(c) + ": " + Character.toString(c).matches("[A-Za-z&&[b-e]]"));
      }
      ```

      If that is actually the intended behavior, then it should be made more clear that "operand" is the enclosing character class. Because the current documentation (and examples) make it look like it only applies to the immediately preceding characters.

      Possibly related to JDK-8037397


            igraves Ian Graves
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: