FULL PRODUCT VERSION :
java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b07)
Java HotSpot(TM) Client VM (build 17.0-b17, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7600]
A DESCRIPTION OF THE PROBLEM :
When using java.util.Scanner, after about 1024 characters are scanned, the match location (reported by match().start() and match().end()) resets, and from then on is incorrect.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See source code for runnable test case.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Should not throw, subsequent matches should never be before previous matches when scanning from start to finish.
ACTUAL -
throws Exception: Start went from 1018 to 0
match.start() returns 0, match.end() returns 9. This is not the correct location of the match.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. In id lectus neque. Quisque dictum, eros vitae euismod lobortis, libero erat euismod mauris, vel consectetur lacus felis a elit. Nam leo tellus, sodales ornare ornare vehicula, aliquet eu ante. Sed a congue diam. Donec a hendrerit dui. Nulla nec metus at eros convallis pulvinar nec in neque. In vitae risus elit, egestas pretium leo. Etiam urna arcu, venenatis id faucibus ut, volutpat id est. Donec sit amet mollis sem. Ut mauris mi, ultricies tincidunt pharetra sed, rutrum sit amet lectus. Nulla in ante nec nisi venenatis commodo. Phasellus hendrerit dolor a neque adipiscing in porta tortor consectetur. Pellentesque volutpat gravida tristique. Etiam ultricies, orci eget ornare consectetur, lacus nulla sagittis dui, et rhoncus urna turpis eu tortor. Vivamus non nisl sit amet nunc lobortis sodales. Praesent rutrum scelerisque tortor et scelerisque. Quisque lectus neque, varius at blandit sit amet, ullamcorper sit amet urna. Nunc lectus arcu, feugiat sed vulputate et, consequat eu nulla. Quisque purus elit, malesuada sed pellentesque at, iaculis tempus tortor.";
Scanner s = new Scanner(text);
int previousStart = 0;
int previousEnd = 0;
while (s.hasNext()) {
String word = s.next();
MatchResult match = s.match();
int start = match.start();
if (start < previousStart)
throw new Exception("Start went from " + previousStart + " to " + start);
previousStart = start;
int end = match.end();
if (end < previousEnd)
throw new Exception("End went from " + previousEnd + " to " + end);
previousEnd = end;
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
I'm not sure if this can be worked around completely. You can keep track of start and end manually, and if start becomes zero, then use the previous end value to determine the correct start value, but that requires knowing the exact length of the specific delimiter that was just skipped over, which doesn't seem to be exposed.
java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b07)
Java HotSpot(TM) Client VM (build 17.0-b17, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7600]
A DESCRIPTION OF THE PROBLEM :
When using java.util.Scanner, after about 1024 characters are scanned, the match location (reported by match().start() and match().end()) resets, and from then on is incorrect.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See source code for runnable test case.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Should not throw, subsequent matches should never be before previous matches when scanning from start to finish.
ACTUAL -
throws Exception: Start went from 1018 to 0
match.start() returns 0, match.end() returns 9. This is not the correct location of the match.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. In id lectus neque. Quisque dictum, eros vitae euismod lobortis, libero erat euismod mauris, vel consectetur lacus felis a elit. Nam leo tellus, sodales ornare ornare vehicula, aliquet eu ante. Sed a congue diam. Donec a hendrerit dui. Nulla nec metus at eros convallis pulvinar nec in neque. In vitae risus elit, egestas pretium leo. Etiam urna arcu, venenatis id faucibus ut, volutpat id est. Donec sit amet mollis sem. Ut mauris mi, ultricies tincidunt pharetra sed, rutrum sit amet lectus. Nulla in ante nec nisi venenatis commodo. Phasellus hendrerit dolor a neque adipiscing in porta tortor consectetur. Pellentesque volutpat gravida tristique. Etiam ultricies, orci eget ornare consectetur, lacus nulla sagittis dui, et rhoncus urna turpis eu tortor. Vivamus non nisl sit amet nunc lobortis sodales. Praesent rutrum scelerisque tortor et scelerisque. Quisque lectus neque, varius at blandit sit amet, ullamcorper sit amet urna. Nunc lectus arcu, feugiat sed vulputate et, consequat eu nulla. Quisque purus elit, malesuada sed pellentesque at, iaculis tempus tortor.";
Scanner s = new Scanner(text);
int previousStart = 0;
int previousEnd = 0;
while (s.hasNext()) {
String word = s.next();
MatchResult match = s.match();
int start = match.start();
if (start < previousStart)
throw new Exception("Start went from " + previousStart + " to " + start);
previousStart = start;
int end = match.end();
if (end < previousEnd)
throw new Exception("End went from " + previousEnd + " to " + end);
previousEnd = end;
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
I'm not sure if this can be worked around completely. You can keep track of start and end manually, and if start becomes zero, then use the previous end value to determine the correct start value, but that requires knowing the exact length of the specific delimiter that was just skipped over, which doesn't seem to be exposed.
- is blocked by
-
JDK-8132995 Matcher$ImmutableMatchResult should be optimized to reduce space usage
-
- Resolved
-