-
Bug
-
Resolution: Unresolved
-
P3
-
8, 9
-
generic
-
generic
FULL PRODUCT VERSION :
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows 8.x
A DESCRIPTION OF THE PROBLEM :
I've found a strange behaviour of java.util.Scanner class. I tried to split a String variable into a set of tokens separated by the delimiter ";" using a Scanner variable.
If I consider a string of "<any_char>[*1022]" + ";[*n]" I expect that Scanner returns a number n of token. However, when n=3, the Scanner class fails: it "see" just 2 tokens instead of 3. I think it's something related to internal char buffer size of Scanner class (1024 characters) and I've found this issue only if the last characters are exacly the delimiter set for the Scanner variable.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Generate a string of composed by 2 parts:
1- 1022 random characters (even the delimiter)
2- an ending set of 3 characters exactly the same as the delimiter set (in my case ";;;")
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
If I consider a string of "a[*1022]" + ";[*n]" I expect a number n of token. However if n=3 the Scanner class fails: it "see" just 2 tokens instead of 3. I think it's something related to internal char buffer size of Scanner class.
a[x1022]; -> 1 token
a[x1022];; -> 2 token
a[x1022];;; -> 3 token
a[x1022];;;; -> 4 token
ACTUAL -
a[x1022]; -> 1 token: correct
a[x1022];; -> 2 token: correct
a[x1022];;; -> 2 token: wrong (I expect 3 tokens)
a[x1022];;;; -> 4 token: correct
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
I attach a simple example:
import java.util.Scanner;
public static void main(String[] args) {
// generate test string: (1022x "a") + (3x ";")
String testLine = "";
for (int i = 0; i < 1022; i++) {
testLine = testLine + "a";
}
testLine = testLine + ";;;";
// set up the Scanner variable
String delimeter = ";";
Scanner lineScanner = new Scanner(testLine);
lineScanner.useDelimiter(delimeter);
int p = 0;
// tokenization
while (lineScanner.hasNext()){
p++;
String currentToken = lineScanner.next();
System.out.println("token" + p + ": '" + currentToken + "'");
}
lineScanner.close();
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Using String .split method
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows 8.x
A DESCRIPTION OF THE PROBLEM :
I've found a strange behaviour of java.util.Scanner class. I tried to split a String variable into a set of tokens separated by the delimiter ";" using a Scanner variable.
If I consider a string of "<any_char>[*1022]" + ";[*n]" I expect that Scanner returns a number n of token. However, when n=3, the Scanner class fails: it "see" just 2 tokens instead of 3. I think it's something related to internal char buffer size of Scanner class (1024 characters) and I've found this issue only if the last characters are exacly the delimiter set for the Scanner variable.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Generate a string of composed by 2 parts:
1- 1022 random characters (even the delimiter)
2- an ending set of 3 characters exactly the same as the delimiter set (in my case ";;;")
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
If I consider a string of "a[*1022]" + ";[*n]" I expect a number n of token. However if n=3 the Scanner class fails: it "see" just 2 tokens instead of 3. I think it's something related to internal char buffer size of Scanner class.
a[x1022]; -> 1 token
a[x1022];; -> 2 token
a[x1022];;; -> 3 token
a[x1022];;;; -> 4 token
ACTUAL -
a[x1022]; -> 1 token: correct
a[x1022];; -> 2 token: correct
a[x1022];;; -> 2 token: wrong (I expect 3 tokens)
a[x1022];;;; -> 4 token: correct
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
I attach a simple example:
import java.util.Scanner;
public static void main(String[] args) {
// generate test string: (1022x "a") + (3x ";")
String testLine = "";
for (int i = 0; i < 1022; i++) {
testLine = testLine + "a";
}
testLine = testLine + ";;;";
// set up the Scanner variable
String delimeter = ";";
Scanner lineScanner = new Scanner(testLine);
lineScanner.useDelimiter(delimeter);
int p = 0;
// tokenization
while (lineScanner.hasNext()){
p++;
String currentToken = lineScanner.next();
System.out.println("token" + p + ": '" + currentToken + "'");
}
lineScanner.close();
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Using String .split method
- relates to
-
JDK-8176407 (scanner) Scanner delimits incorrectly when delimiter spans a buffer boundary
- Open