Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8337139

more search/extract/replace APIs for java.util.regex

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • None
    • core-libs
    • None

      Add APIs for streaming over successive Matcher results in context. Context must include spans that do NOT match the pattern, as well as spans that DO match. There should be flexible ways to view the work of a Matcher as a parse of some text, where the matches are parsed tokens, but the unmatched text is aso regularly available.

      Currently, the only APIs for extracting pattern match results in context are the various split methods. But these have strange corner-case behaviors inherited from Perl that will trip up the unwary. Also, they do not stream, so there is no ability to stop processing if you find what you want early.

      There is a streaming API on Matcher but it does not include a way to process non-matching spans between matches. To do this in a modern way requires a variation of MatchResult which points to the unmatched text BETWEEN matches. (I propose MatchResult::isNegative to represent this without excessive type ceremony.)

      The use of replacement strings (like $0 and ${namedgroup}) is a powerful notation for processing matches when you find them. Unfortunately, the syntax processing is hardwired into a few methods of Matcher, and it is impossible to take an arbitrary MatchResult and expand its replacement, if you want the isolated replacement string. The Matcher API only allows you to append a replacement AND previous unmatched text, to a string buffer. These capabilities should be teased apart so they can be more widely used. For example, there should be a Matcher::replacements method which allows you to get a stream of replacement expansions for all the matches. This combined with negative match results (intervening non-matched spans) gives a very powerful way to parse and transform whole spans of text.

      Also, possibly as a separate RFE, the regular expression using methods of String are kind of a dead end. You can use easily, but if you want performance from precompiled patterns, and/or you want something that requires manipulation of a matcher (or stream of results; see above) you have to perform a very heavy refactoring, of an expression into a block of imperative code with an input buffer (Matcher) and an output buffer (StringBuilder). If we do better here, with gentler refactorings, we can encourage String users to incrementally ascend the on-ramp to expertise with the regex APIs.

      Here is a work in progress illustrating these points:

      https://github.com/rose00/jdk/tree/match-results

      Specdiff:

      https://cr.openjdk.org/~jrose/jdk/8337139-record-matches/overview-summary.html

      Names and number of convenience methods are provisional, of course.

            Unassigned Unassigned
            jrose John Rose
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: