Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P4
Fix Version/s: 1.2.0
Affects Version/s: 1.1.4
Component/s: core-libs
Labels:
- webbug

Subcomponent:
java.io
Resolved In Build:
1.2beta4
CPU:

generic
OS:

generic
Verification:
Not verified

Name: dgC58589 Date: 01/26/98

java.io.StreamTokenizer should be able to parse
"/" as a word constituent and strip C and/or C++
comments simultaneously.

My application is parsing ascii files containing
market data with "/"-delimited dates; I can think
of others. The documentation for the
StreamTokenizer class is inadequate and gives no
hint that this won't work.

"/" dhould be allowed to be a word constituent, so date
strings like "1/16/98" get parsed as words. Otherwise, if "/" is an
ordinary character (and " " is white space), there's no way to tell the
difference between "1/1" and "1 / 1", "1/ 1", or "1 /1".

The clause in the main loop of the tokenizer that begins
"if ((ctype & CT_ALPHA) != 0)", which parses words, appears before the
one that
begins "if c == '/' && (slashSlashCommentsP...", so if "/" is set to be
a word constituent, there's no way the tokenizer can possibly parse C or
C++ comments. Anyway, I've already got a fix for the source code. I can
send it to you if you're interested.

Fix diff against the JDK 1.1.5 FCS sosurce code

567,578c567
<
< // +++ Modified segment begins here:
< //
< if (specialSlash(c)) {
< return nextToken();
< }
< buf[0] = (char) c;
< c = peekc;
< int i = 1;
< //
< // +++ Modified segment ends here.
<
---
> int i = 0;
667,678d655
<
< // +++ Modified code segment begins here:
< //
< if (specialSlash(c)) {
< return nextToken();
< }
< //
< // +++ Modified code segment ends here.
< return ttype = c;
< }
<
< private boolean specialSlash(int c) throws java.io.IOException {
696,700c673,674
< if (c < 0) {
< String s =
< "reached eof while parsing C-style comment";
< throw new RuntimeException(s);
< }
---
> if (c < 0)
> return ttype = TT_EOF;
704c678
< return true;
---
> return nextToken();
708c682
< return true;
---
> return nextToken();
711c685
< return false;
---
> return ttype = '/';
713,715d686
< } else {
< peekc = read();
< return false;
716a688,689
> peekc = read();
> return ttype = c;

(Review ID: 23516)
======================================================================

mircea.oancea@canada 1998-02-25

More information from the client receieved on Fri, 20 Feb 1998 18:41:39

We've been parsing lots of files with my modified version of
StreamTokenizer, and we found a bug. My "fix" resulted in a failure to
parse one-character words. (The character after the first word
constituent was always treated as a word constituent.) The attached
version adds one more line and changes a "do" to a "while". It could
still use more testing (especially since I've only tested the features I
need).

THe following diff is from the patched version obtained from the original
one with the above patch applied.

574d573
< c = peekc;
576,577c575,576
< //
< // +++ Modified segment ends here.
---
> c = peekc;
> ctype = c < 0 ? CT_WHITESPACE : c < 256 ? ct[c] : CT_ALPHA;
579c578
< do {
---
> while ((ctype & (CT_ALPHA | CT_DIGIT)) != 0) {
588c587,589
< } while ((ctype & (CT_ALPHA | CT_DIGIT)) != 0);
---
> }
> //
> // +++ Modified segment ends here.

Assignee:: Zhenghua Li (Inactive)

Reporter:: David Graham-cumming (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 1998-01-26 12:11

Updated:: 1999-01-14 17:29

Resolved:: 1999-01-14 17:29

Imported:: 15/Sep/12 9:47 PM

Indexed:: 17/Jul/12 6:19 PM

Details

Description

Attachments

Activity

People

Dates