Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8178116

(scanner) scanner.findWithinHorizon doesn't advance after matching zero characters

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: P4
    • Resolution: Unresolved
    • Affects Version/s: 6, 7, 8
    • Fix Version/s: tbd
    • Component/s: core-libs
    • Labels:
      None

      Description

      Matcher.find() has the behavior of advancing the cursor after a zero-length match. That way, even if a regex repeatedly matches zero characters, progress is still made through the input, and any enclosing loop will eventually terminate.

      Scanner.findWithinHorizon doesn't have this behavior. A simple loop over a regex that matches zero characters will return the same match every time, without progressing through the input, resulting in an infinite loop.

      Examples:

          static void matcherFind() {
              Pattern p = Pattern.compile("a*");
              Matcher m = p.matcher("abaab");
              while (m.find()) {
                  System.out.print("<" + m.group() + "> ");
              }
              System.out.println();
          }

      This results in the following output:

          <a> <> <aa> <> <>

      after which the method returns.

          static void scannerFind() {
              Scanner sc = new Scanner("abaab");
              Pattern p = Pattern.compile("a*");
              String s;
              while ((s = sc.findWithinHorizon(p, 0)) != null) {
                  System.out.print("<" + s + "> ");
              }
              System.out.println();
          }

      This results in the following output:

          <a> <> <> <> <> <> <> ...

      which never terminates.

      Since Scanner.findAll() is based on findWithinHorizon (really findPatternInBuffer) it will produce an infinite stream of empty matches if the regex matches zero characters.

      The workaround is to specify a regex that always matches at least one character. For the above example, using "a+" instead of "a*" won't result in any empty matches, but it will probably be sufficient for the above examples.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              smarks Stuart Marks
              Reporter:
              smarks Stuart Marks
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated: