-
Bug
-
Resolution: Fixed
-
P2
-
9
-
b168
-
Verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8179827 | 10 | Stuart Marks | P2 | Resolved | Fixed | b07 |
For example, given a string "aaabbaaaabbb" one might naively try to split this into groups of a's and b's using the following:
new Scanner("aaabbaaaabbb").findAll("a*|b*")
This produces one match result for "aaa" followed by an infinite stream of empty match results. The problem is that the provided regex can successfully match zero characters. The spec for findAll() specifies that the stream is terminated when a match is *unsuccessful*. Thus the result is an infinite stream of successful, zero-length matches.
This isn't incorrect, but it's probably not what the programmer wanted, and it's somewhat surprising. (A better regex for this case would be "a+|b+".)
At the very least, this is worth a note in the documentation. If somebody were writing an loop using findWithinHorizon(), they would probably quickly notice that they were getting zero-length matches, and they'd either adjust the loop logic or adjust the pattern. Since the loop is embedded within the stream, this is less obvious for findAll().
An alternative would be to terminate the stream if a zero-length match occurs. This seems somewhat unwise, as it's rather a special case. The caller might want zero-length matches (though this is hard to imagine). An infinite stream can easily be terminated using constructs like limit(), so getting an infinite stream isn't necessarily an error.
- backported by
-
JDK-8179827 Scanner.findAll() can return infinite stream if regex matches zero chars
-
- Resolved
-
- relates to
-
JDK-8178116 (scanner) scanner.findWithinHorizon doesn't advance after matching zero characters
-
- Open
-