-
Bug
-
Resolution: Fixed
-
P2
-
6
-
1.4
-
x86
-
linux
-
Verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2173433 | 7 | Joe Wang | P2 | Closed | Fixed | m05 |
JDK-2173432 | 6u14 | Abhijit Saha | P2 | Resolved | Fixed | b03 |
FULL PRODUCT VERSION :
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
A DESCRIPTION OF THE PROBLEM :
When parsing huge XML files (> 200MB) with SAX Java 6 runs out of memory, because the whole input file is stored in memory. Java 1.5 and the current Xerces version 2.9.0 work fine.
I assume that there is a bug in XMLDocumentScannerImpl. It has a flag fReadingDTD indicating that currently the DTD is read. If this is true, refresh(int) adds character to a buffer. It seems the end of the DTD is not recognized and the whole XML file is added to the buffer.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the code which creates a large XML file in tmp (i.e. /var/tmp) location, and the OutOfMemoryError will show.
Parse it with the standard SAXParser using at least an EntityResolver that resolves the SystemId.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Should work without any OutOfMemory errors
ACTUAL -
OutOfMemory error
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:205)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.refresh(XMLDocumentScannerImpl.java:1493)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.invokeListeners(XMLEntityScanner.java:2070)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral(XMLEntityScanner.java:1063)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:974)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1537)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1314)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2740)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:645)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:508)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at webbugstestcases.jaxp.sax.inc920008.SAXParserTest.main(SAXParserTest.java:71)
Java Result: 1
---------- BEGIN SOURCE ----------
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
public class SAXParserTest {
private static final String DTD =
"<!ELEMENT config (config*,entry*)*>\n"
+ "<!ATTLIST config key CDATA #REQUIRED>\n"
+ "<!ELEMENT entry (#PCDATA)>\n"
+ "<!ATTLIST entry key CDATA #REQUIRED type CDATA
#REQUIRED value CDATA #REQUIRED isnull CDATA #IMPLIED >";
private static final EntityResolver RESOLVER = new EntityResolver() {
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
InputSource is = new InputSource(new StringReader(DTD));
return is;
}
};
public static void main(String[] args) throws
ParserConfigurationException,
SAXException, FileNotFoundException, IOException {
// create a huge XML file
File test = File.createTempFile("test", "xml");
test.deleteOnExit();
BufferedWriter out = new BufferedWriter(new FileWriter(test));
out.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
out.write("<!DOCTYPE config SYSTEM
\"org/knime/core/node/config/XMLConfig.dtd\">\n");
out.write("<config key=\"root\">\n");
for (int i = 0; i < 1000000; i++) {
out.write("<config key=\"" + i + "\">");
out.write("<entry key=\"datacell\" type=\"xstring\"
value=\"org.knime.core.data.def.IntCell\"/>\n");
out.write("</config>\n");
}
out.write("</config>");
out.close();
// try to parse it
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setEntityResolver(RESOLVER);
// java.lang.OutOfMemoryError: Java heap space, even with 256MB heap
reader.parse(new InputSource(new FileInputStream(test)));
}
}
---------- END SOURCE ----------
REPRODUCIBILITY :
This bug can be reproduced always.
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
A DESCRIPTION OF THE PROBLEM :
When parsing huge XML files (> 200MB) with SAX Java 6 runs out of memory, because the whole input file is stored in memory. Java 1.5 and the current Xerces version 2.9.0 work fine.
I assume that there is a bug in XMLDocumentScannerImpl. It has a flag fReadingDTD indicating that currently the DTD is read. If this is true, refresh(int) adds character to a buffer. It seems the end of the DTD is not recognized and the whole XML file is added to the buffer.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the code which creates a large XML file in tmp (i.e. /var/tmp) location, and the OutOfMemoryError will show.
Parse it with the standard SAXParser using at least an EntityResolver that resolves the SystemId.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Should work without any OutOfMemory errors
ACTUAL -
OutOfMemory error
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:205)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.refresh(XMLDocumentScannerImpl.java:1493)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.invokeListeners(XMLEntityScanner.java:2070)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral(XMLEntityScanner.java:1063)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:974)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1537)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1314)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2740)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:645)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:508)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at webbugstestcases.jaxp.sax.inc920008.SAXParserTest.main(SAXParserTest.java:71)
Java Result: 1
---------- BEGIN SOURCE ----------
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
public class SAXParserTest {
private static final String DTD =
"<!ELEMENT config (config*,entry*)*>\n"
+ "<!ATTLIST config key CDATA #REQUIRED>\n"
+ "<!ELEMENT entry (#PCDATA)>\n"
+ "<!ATTLIST entry key CDATA #REQUIRED type CDATA
#REQUIRED value CDATA #REQUIRED isnull CDATA #IMPLIED >";
private static final EntityResolver RESOLVER = new EntityResolver() {
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
InputSource is = new InputSource(new StringReader(DTD));
return is;
}
};
public static void main(String[] args) throws
ParserConfigurationException,
SAXException, FileNotFoundException, IOException {
// create a huge XML file
File test = File.createTempFile("test", "xml");
test.deleteOnExit();
BufferedWriter out = new BufferedWriter(new FileWriter(test));
out.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
out.write("<!DOCTYPE config SYSTEM
\"org/knime/core/node/config/XMLConfig.dtd\">\n");
out.write("<config key=\"root\">\n");
for (int i = 0; i < 1000000; i++) {
out.write("<config key=\"" + i + "\">");
out.write("<entry key=\"datacell\" type=\"xstring\"
value=\"org.knime.core.data.def.IntCell\"/>\n");
out.write("</config>\n");
}
out.write("</config>");
out.close();
// try to parse it
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setEntityResolver(RESOLVER);
// java.lang.OutOfMemoryError: Java heap space, even with 256MB heap
reader.parse(new InputSource(new FileInputStream(test)));
}
}
---------- END SOURCE ----------
REPRODUCIBILITY :
This bug can be reproduced always.
- backported by
-
JDK-2173432 SAX parser throws OutOfMemoryError
-
- Resolved
-
-
JDK-2173433 SAX parser throws OutOfMemoryError
-
- Closed
-
- relates to
-
JDK-7110101 SAX parser runs out of memory on jdk6 (windows-amd64)
-
- Closed
-