Details
-
Enhancement
-
Resolution: Fixed
-
P4
-
None
-
None
-
b24
-
generic
-
generic
Description
The nested class ImmutableMatchResult was added to Matcher by JDK-8071479 in order to support consistent read semantics, so multiple MatchResult instances could be collected by the caller and subsequent pattern matching with the same matcher would not affect the results. The implementation technique is essentially to copy the necessary state out of the originating Matcher object.
Unfortunately this includes the entire input CharSequence (the "text" field) against which the matching operation was applied. The implementation of Matcher.toMatchResult() simply calls text.toString() to make a copy the input CharSequence. This is fine if the target CharSequence is a String, since String.toString() is a no-op. But if the input is a CharBuffer or a StringBuilder, then toString() will make a copy of the entire input. (Note that Scanner applies regex matching against a CharBuffer.)
Copying the entire input is potentially wasteful, as it's probably quite common for a match to embody only a small portion of the input. The only reason MatchResult needs the input text is to report the entire matched string, and subsets of the string represented by matching groups. In principle, ImmutableMatchResult could copy only a subsequence of the input that corresponds to the complete match.
The offset values would still need to be relative to the original input, not the copy, so the offset of the full match (the "first" field) would have to be subtracted from the values supplied to text.subSequence() when extracting the match text.
Unfortunately this includes the entire input CharSequence (the "text" field) against which the matching operation was applied. The implementation of Matcher.toMatchResult() simply calls text.toString() to make a copy the input CharSequence. This is fine if the target CharSequence is a String, since String.toString() is a no-op. But if the input is a CharBuffer or a StringBuilder, then toString() will make a copy of the entire input. (Note that Scanner applies regex matching against a CharBuffer.)
Copying the entire input is potentially wasteful, as it's probably quite common for a match to embody only a small portion of the input. The only reason MatchResult needs the input text is to report the entire matched string, and subsets of the string represented by matching groups. In principle, ImmutableMatchResult could copy only a subsequence of the input that corresponds to the complete match.
The offset values would still need to be relative to the original input, not the copy, so the offset of the full match (the "first" field) would have to be subtracted from the values supplied to text.subSequence() when extracting the match text.
Attachments
Issue Links
- blocks
-
JDK-6988771 (scanner) java.util.Scanner does not always report correct match location information
- Open
- relates to
-
JDK-8312976 MatchResult produces StringIndexOutOfBoundsException for groups outside match
- Closed