Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4466590

java.util.regex.Pattern: errors and omissions in docs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P3 P3
    • 1.4.0
    • 1.4.0
    • core-libs
    • beta2
    • generic
    • generic
    • Verified



      Name: bsC130419 Date: 06/05/2001


      java version "1.4.0-beta"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta-b65)
      Java HotSpot(TM) Client VM (build 1.4.0-beta-b65, mixed mode)


      The class-description JavaDoc for the Pattern class suffers
      from several errors and omissions which make it difficult for
      developers to use the new regex classes. They make it even
      more difficult to identify bugs, since we can't know for sure
      what some of the regex constructs are supposed to do. In
      particular, there is no description of the conditional
      construct, '(?(X)Y|Z)', and no mention at all of the first-
      occurrence-of construct, 'X!'. If you post this submission
      on the BugParade, I hope you will also post the specs for
      those two constructs, so that we can get on with squashing
      bugs.

      Here is a list of problems that I have found with the
      JavaDoc, in the order that they appear:

      Section "Summary of regular-expression constructs"

      o The "Characters" subsection doesn't list the vertical-tab
        character, '\v', which is supported. (This character also
        needs to be added to the descriptions of the "\s" and
        "{space}" constructs.)

      o In subsection "Boundary matchers", the descriptions of "^"
        and "$" should state that their default behavior is the
        same as "\A" and "\Z", respectively, and that they also
        match line boundaries when multiline mode is set.

      o In subsection "Back references", need to add the following
        entry:

          \Rnn Whatever the nn'th capturing group matched

      o "Special constructs" should include entries for (?(X)Y),
        (?(X)Y|Z), and X!. Especially that last one; you've pro-
        moted the '!' to a first-class metacharacter, but didn't
        tell anyone. We at least need to know that we have to
        quote it with a backslash to match a literal '!'.


      Section "Comparison to Perl 5"

      o Under "Perl constructs not supported by this class:", I
        think the first item should read, "The embedded-code
        constructs (?{code}) and (??{code})". The conditionals
        _are_ supported, though they work differently than in Perl
        (more about that later).

      o "Constructs supported by this class but not by Perl:"
        should include an entry for the X! construct, along with
        a complete description of what it does. I'm about 99%
        sure that I've found a bug in this construct, but I would
        like to see some kind of spec before I report it.
        
      o Under "Notable differences from Perl:", the description of
        Pattern's backreferences is misleading: it seems to be
        saying that "\1234" will be interpreted as a reference to
        capturing group #1,234. Actually, it comes out as a
        reference to group #1 followed by the literal sequence
        "234", no matter how many capturing groups there are. If
        you want to refer back to groups 10 through 99, you have to
        use the "\Rnn" form--and again, any digits after the second
        one will simply be matched literally.


      And finally, about those conditional constructs. In Perl, the
      conditional--the "(X)" in (?(X)Y|Z)--can only be a zero-width
      assertion. It may be a lookahead or lookbehind construct, or
      X may be a number, representing a query as to whether group #X
      matched or not. The Pattern class, on the other hand, allows
      (X) to be any valid subexpression, and it doesn't seem to
      support numbered "backassertions". If these differences are
      intentional, then an entry to that effect needs to be added to
      "Notable differences"; otherwise, another bug report is in
      order.
      (Review ID: 125889)
      ======================================================================

            mmcclosksunw Michael Mccloskey (Inactive)
            bstrathesunw Bill Strathearn (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: