Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8150488

Scanner.findAll() can return infinite stream if regex matches zero chars



    • b168
    • Verified



        JDK-8072722 added Scanner.findAll(), which produces a stream of match results. This mostly works fine, but if the programmer isn't careful, it can result in an infinite stream of empty match results. This usually isn't what's desired.

        For example, given a string "aaabbaaaabbb" one might naively try to split this into groups of a's and b's using the following:

            new Scanner("aaabbaaaabbb").findAll("a*|b*")

        This produces one match result for "aaa" followed by an infinite stream of empty match results. The problem is that the provided regex can successfully match zero characters. The spec for findAll() specifies that the stream is terminated when a match is *unsuccessful*. Thus the result is an infinite stream of successful, zero-length matches.

        This isn't incorrect, but it's probably not what the programmer wanted, and it's somewhat surprising. (A better regex for this case would be "a+|b+".)

        At the very least, this is worth a note in the documentation. If somebody were writing an loop using findWithinHorizon(), they would probably quickly notice that they were getting zero-length matches, and they'd either adjust the loop logic or adjust the pattern. Since the loop is embedded within the stream, this is less obvious for findAll().

        An alternative would be to terminate the stream if a zero-length match occurs. This seems somewhat unwise, as it's rather a special case. The caller might want zero-length matches (though this is hard to imagine). An infinite stream can easily be terminated using constructs like limit(), so getting an infinite stream isn't necessarily an error.


          Issue Links



                smarks Stuart Marks
                smarks Stuart Marks
                0 Vote for this issue
                4 Start watching this issue