-
Bug
-
Resolution: Unresolved
-
P4
-
8, 9
FULL PRODUCT VERSION :
$ java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
$ uname -a
Linux myhost 3.17.3-200.fc20.x86_64 #1 SMP Fri Nov 14 19:45:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
When using a java.util.Scanner to read text (from a String or a file), the scanner doesn't apply delimiters correctly if a delimiter falls across the internal CharBuffer (buf) boundary.
This appears to only occur when the delimiter is a disjunctive regex, where the one delimiter includes another.
For example, if the delimiters are "," and "#,#" (without quotes), and the scanner source contains
...ddd#,#eee...
with the internal Scanner CharBuffer spanning up to and including the first #, then the scanner will return "...ddd#" as one token, followed by "#eee..." as the next token.
See included example source, which sets up the above scenario, based on the scanner BUFFER_SIZE of 1024.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Execute the program provided in the "Source code" section below.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No output expected, as I wouldn't expect any delimiter value to appear in a token obtained from scanner.next()
ACTUAL -
Delimiter # found in: dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd#
Delimiter # found in: #eeeeeeeeeeee
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ScannerBug
{
public static void main(String[] args) throws FileNotFoundException
{
Scanner scanner = new Scanner("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb#,#cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc,dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd#,#eeeeeeeeeeee");
scanner.useDelimiter("(,)|(#,#)"); // delimit on "," and "#,#"
while(scanner.hasNext()){
String next = scanner.next();
if(next.contains("#")){
System.out.println("Delimiter # found in: " + next);
}
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Don't use the Scanner class - use an alternative.
$ java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
$ uname -a
Linux myhost 3.17.3-200.fc20.x86_64 #1 SMP Fri Nov 14 19:45:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
When using a java.util.Scanner to read text (from a String or a file), the scanner doesn't apply delimiters correctly if a delimiter falls across the internal CharBuffer (buf) boundary.
This appears to only occur when the delimiter is a disjunctive regex, where the one delimiter includes another.
For example, if the delimiters are "," and "#,#" (without quotes), and the scanner source contains
...ddd#,#eee...
with the internal Scanner CharBuffer spanning up to and including the first #, then the scanner will return "...ddd#" as one token, followed by "#eee..." as the next token.
See included example source, which sets up the above scenario, based on the scanner BUFFER_SIZE of 1024.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Execute the program provided in the "Source code" section below.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No output expected, as I wouldn't expect any delimiter value to appear in a token obtained from scanner.next()
ACTUAL -
Delimiter # found in: dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd#
Delimiter # found in: #eeeeeeeeeeee
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ScannerBug
{
public static void main(String[] args) throws FileNotFoundException
{
Scanner scanner = new Scanner("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb#,#cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc,dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd#,#eeeeeeeeeeee");
scanner.useDelimiter("(,)|(#,#)"); // delimit on "," and "#,#"
while(scanner.hasNext()){
String next = scanner.next();
if(next.contains("#")){
System.out.println("Delimiter # found in: " + next);
}
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Don't use the Scanner class - use an alternative.
- clones
-
JDK-8072582 Scanner delimits incorrectly when delimiter spans a buffer boundary
- Resolved
- relates to
-
JDK-8176371 (scanner) Scanner fails when string length equals buffer size and latest characters are the delimiter
- Open