Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8176371

(scanner) Scanner fails when string length equals buffer size and latest characters are the delimiter

XMLWordPrintable

    • generic
    • generic

      FULL PRODUCT VERSION :


      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows 8.x

      A DESCRIPTION OF THE PROBLEM :
      I've found a strange behaviour of java.util.Scanner class. I tried to split a String variable into a set of tokens separated by the delimiter ";" using a Scanner variable.

      If I consider a string of "<any_char>[*1022]" + ";[*n]" I expect that Scanner returns a number n of token. However, when n=3, the Scanner class fails: it "see" just 2 tokens instead of 3. I think it's something related to internal char buffer size of Scanner class (1024 characters) and I've found this issue only if the last characters are exacly the delimiter set for the Scanner variable.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Generate a string of composed by 2 parts:
      1- 1022 random characters (even the delimiter)
      2- an ending set of 3 characters exactly the same as the delimiter set (in my case ";;;")

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      If I consider a string of "a[*1022]" + ";[*n]" I expect a number n of token. However if n=3 the Scanner class fails: it "see" just 2 tokens instead of 3. I think it's something related to internal char buffer size of Scanner class.

      a[x1022]; -> 1 token

      a[x1022];; -> 2 token

      a[x1022];;; -> 3 token

      a[x1022];;;; -> 4 token
      ACTUAL -
      a[x1022]; -> 1 token: correct

      a[x1022];; -> 2 token: correct

      a[x1022];;; -> 2 token: wrong (I expect 3 tokens)

      a[x1022];;;; -> 4 token: correct

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      I attach a simple example:

      import java.util.Scanner;

      public static void main(String[] args) {

          // generate test string: (1022x "a") + (3x ";")
          String testLine = "";
          for (int i = 0; i < 1022; i++) {
              testLine = testLine + "a";
          }
          testLine = testLine + ";;;";

          // set up the Scanner variable
          String delimeter = ";";
          Scanner lineScanner = new Scanner(testLine);
          lineScanner.useDelimiter(delimeter);
          int p = 0;

          // tokenization
          while (lineScanner.hasNext()){
                  p++;
                  String currentToken = lineScanner.next();
                  System.out.println("token" + p + ": '" + currentToken + "'");
          }
          lineScanner.close();
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Using String .split method

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: