-
Bug
-
Resolution: Duplicate
-
P4
-
None
-
8
-
x86
-
windows_7
FULL PRODUCT VERSION :
java version 1.8.0_05
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) Client VM (build 25.5-b02, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Windows 7 Professional 32 bit Version 6.1.7600
A DESCRIPTION OF THE PROBLEM :
When you use javax.xml.parsers.SAXParser against a big, big XML file, its handler's characters(char[] ch, int start, int length) returns wrong buffer sometimes.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
To reproduce this issue, download a zip file from http://www.ac.auone-net.jp/~lovelyfl/BigXML.zip .
The file has one xml file and Java source to report the error.
The source code is exactly the same as reported below.
Run the provided Java code as follows.
java workbench.sax.SaxParserTest BigXML.xml
You will get:
>A subject of multiple lines follows:
>
>55416 55416 55416 55416 55416 55416 55416 55416 5541 55276 55276 55276 55276 55276 55276 55276
>End of the subject
This is error because all the subject elements are single line in the BigXML.xml file.
If you uncomment the following line in the source code, you can make more sense of what is happening.
//System.out.println(text);
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No message should be reported. This is quiet program when everything is OK.
ACTUAL -
A subject of multiple lines follows:
55416 55416 55416 55416 55416 55416 55416 55416 5541 55276 55276 55276 55276 55276 55276 55276
End of the subject
A subject of multiple lines follows:
4 64584 64584 64584 6458
64585 4
End of the subject
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
package workbench.sax;
import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class SaxParserTest {
static class SaxParserHandler extends DefaultHandler {
StringBuilder textBuilder = null;
@Override
public void startElement(String uri,
String localName,
String qName,
Attributes attributes) throws SAXException {
if(qName.equals("subject")) {
textBuilder = new StringBuilder();
}
}
@Override
public void endElement(String uri,
String localName,
String qName)
throws SAXException {
if(qName.equals("subject")) {
String text = textBuilder.toString();
if(text.indexOf('\n') >= 0) {
// A multiple-line subject, which doesn't exist in BigXML.xml.
System.out.println("A subject of multiple lines follows:");
System.out.println(text);
System.out.println("End of the subject");
} else {
// This is normal, a single line subject
//System.out.println(text);
}
}
textBuilder = null;
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
// ch references wrong buffer sometimes, not often.
if(textBuilder != null) {
textBuilder.append(ch, start, length);
}
}
}
public static void main(String[] args) {
try {
if(args.length == 1) {
File xmlFile = new File(args[0]);
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
saxParser.parse(xmlFile, new SaxParserHandler());
} else {
System.out.println("Usage: java SaxParserTest BigXML.xml");
}
} catch(Throwable e) {
e.printStackTrace(System.err);
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
No good solution.
In my case, I decomposed the xml file by my own parser to feed the small chunks to SAXParser.
So, this is not my headache.
I'm reporting this for the Java community.
java version 1.8.0_05
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) Client VM (build 25.5-b02, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Windows 7 Professional 32 bit Version 6.1.7600
A DESCRIPTION OF THE PROBLEM :
When you use javax.xml.parsers.SAXParser against a big, big XML file, its handler's characters(char[] ch, int start, int length) returns wrong buffer sometimes.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
To reproduce this issue, download a zip file from http://www.ac.auone-net.jp/~lovelyfl/BigXML.zip .
The file has one xml file and Java source to report the error.
The source code is exactly the same as reported below.
Run the provided Java code as follows.
java workbench.sax.SaxParserTest BigXML.xml
You will get:
>A subject of multiple lines follows:
>
>55416 55416 55416 55416 55416 55416 55416 55416 5541 55276 55276 55276 55276 55276 55276 55276
>End of the subject
This is error because all the subject elements are single line in the BigXML.xml file.
If you uncomment the following line in the source code, you can make more sense of what is happening.
//System.out.println(text);
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No message should be reported. This is quiet program when everything is OK.
ACTUAL -
A subject of multiple lines follows:
55416 55416 55416 55416 55416 55416 55416 55416 5541 55276 55276 55276 55276 55276 55276 55276
End of the subject
A subject of multiple lines follows:
4 64584 64584 64584 6458
64585 4
End of the subject
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
package workbench.sax;
import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class SaxParserTest {
static class SaxParserHandler extends DefaultHandler {
StringBuilder textBuilder = null;
@Override
public void startElement(String uri,
String localName,
String qName,
Attributes attributes) throws SAXException {
if(qName.equals("subject")) {
textBuilder = new StringBuilder();
}
}
@Override
public void endElement(String uri,
String localName,
String qName)
throws SAXException {
if(qName.equals("subject")) {
String text = textBuilder.toString();
if(text.indexOf('\n') >= 0) {
// A multiple-line subject, which doesn't exist in BigXML.xml.
System.out.println("A subject of multiple lines follows:");
System.out.println(text);
System.out.println("End of the subject");
} else {
// This is normal, a single line subject
//System.out.println(text);
}
}
textBuilder = null;
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
// ch references wrong buffer sometimes, not often.
if(textBuilder != null) {
textBuilder.append(ch, start, length);
}
}
}
public static void main(String[] args) {
try {
if(args.length == 1) {
File xmlFile = new File(args[0]);
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
saxParser.parse(xmlFile, new SaxParserHandler());
} else {
System.out.println("Usage: java SaxParserTest BigXML.xml");
}
} catch(Throwable e) {
e.printStackTrace(System.err);
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
No good solution.
In my case, I decomposed the xml file by my own parser to feed the small chunks to SAXParser.
So, this is not my headache.
I'm reporting this for the Java community.
- duplicates
-
JDK-8027359 XML parser returns incorrect parsing results
-
- Resolved
-