Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8176407

(scanner) Scanner delimits incorrectly when delimiter spans a buffer boundary

XMLWordPrintable

      FULL PRODUCT VERSION :
      $ java -version
      java version "1.8.0_31"
      Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
      Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)


      ADDITIONAL OS VERSION INFORMATION :
      $ uname -a
      Linux myhost 3.17.3-200.fc20.x86_64 #1 SMP Fri Nov 14 19:45:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

      A DESCRIPTION OF THE PROBLEM :
      When using a java.util.Scanner to read text (from a String or a file), the scanner doesn't apply delimiters correctly if a delimiter falls across the internal CharBuffer (buf) boundary.

      This appears to only occur when the delimiter is a disjunctive regex, where the one delimiter includes another.

      For example, if the delimiters are "," and "#,#" (without quotes), and the scanner source contains

          ...ddd#,#eee...

      with the internal Scanner CharBuffer spanning up to and including the first #, then the scanner will return "...ddd#" as one token, followed by "#eee..." as the next token.

      See included example source, which sets up the above scenario, based on the scanner BUFFER_SIZE of 1024.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Execute the program provided in the "Source code" section below.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      No output expected, as I wouldn't expect any delimiter value to appear in a token obtained from scanner.next()
      ACTUAL -
      Delimiter # found in: dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd#
      Delimiter # found in: #eeeeeeeeeeee


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.FileNotFoundException;
      import java.util.Scanner;

      public class ScannerBug
      {
          public static void main(String[] args) throws FileNotFoundException
          {
              Scanner scanner = new Scanner("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb#,#cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc,dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd#,#eeeeeeeeeeee");
              scanner.useDelimiter("(,)|(#,#)"); // delimit on "," and "#,#"

              while(scanner.hasNext()){
                  String next = scanner.next();
                  if(next.contains("#")){
                      System.out.println("Delimiter # found in: " + next);
                  }
              }
          }
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Don't use the Scanner class - use an alternative.

            rgiulietti Raffaello Giulietti
            smarks Stuart Marks
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: