Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6536111

SAX parser throws OutOfMemoryError

    XMLWordPrintable

Details

    • 1.4
    • x86
    • linux
    • Verified

    Backports

      Description

        FULL PRODUCT VERSION :
        java version "1.6.0"
        Java(TM) SE Runtime Environment (build 1.6.0-b105)
        Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)


        A DESCRIPTION OF THE PROBLEM :
        When parsing huge XML files (> 200MB) with SAX Java 6 runs out of memory, because the whole input file is stored in memory. Java 1.5 and the current Xerces version 2.9.0 work fine.
        I assume that there is a bug in XMLDocumentScannerImpl. It has a flag fReadingDTD indicating that currently the DTD is read. If this is true, refresh(int) adds character to a buffer. It seems the end of the DTD is not recognized and the whole XML file is added to the buffer.

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        Run the code which creates a large XML file in tmp (i.e. /var/tmp) location, and the OutOfMemoryError will show.

        Parse it with the standard SAXParser using at least an EntityResolver that resolves the SystemId.

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        Should work without any OutOfMemory errors
        ACTUAL -
        OutOfMemory error

        ERROR MESSAGES/STACK TRACES THAT OCCUR :
        Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
                at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:205)
                at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.refresh(XMLDocumentScannerImpl.java:1493)
                at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.invokeListeners(XMLEntityScanner.java:2070)
                at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral(XMLEntityScanner.java:1063)
                at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:974)
                at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1537)
                at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1314)
                at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2740)
                at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:645)
                at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:508)
                at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
                at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
                at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
                at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
                at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
                at webbugstestcases.jaxp.sax.inc920008.SAXParserTest.main(SAXParserTest.java:71)
        Java Result: 1


        ---------- BEGIN SOURCE ----------
        import java.io.BufferedWriter;
        import java.io.File;
        import java.io.FileInputStream;
        import java.io.FileNotFoundException;
        import java.io.FileWriter;
        import java.io.IOException;
        import java.io.StringReader;

        import javax.xml.parsers.ParserConfigurationException;
        import javax.xml.parsers.SAXParser;
        import javax.xml.parsers.SAXParserFactory;

        import org.xml.sax.EntityResolver;
        import org.xml.sax.InputSource;
        import org.xml.sax.SAXException;
        import org.xml.sax.XMLReader;

        public class SAXParserTest {
            private static final String DTD =
                    "<!ELEMENT config (config*,entry*)*>\n"
                            + "<!ATTLIST config key CDATA #REQUIRED>\n"
                            + "<!ELEMENT entry (#PCDATA)>\n"
                            + "<!ATTLIST entry key CDATA #REQUIRED type CDATA
        #REQUIRED value CDATA #REQUIRED isnull CDATA #IMPLIED >";

            private static final EntityResolver RESOLVER = new EntityResolver() {
                public InputSource resolveEntity(String publicId, String systemId)
                        throws SAXException, IOException {
                    InputSource is = new InputSource(new StringReader(DTD));
                    return is;
                }
            };

            public static void main(String[] args) throws
        ParserConfigurationException,
                    SAXException, FileNotFoundException, IOException {
                // create a huge XML file
                File test = File.createTempFile("test", "xml");
                test.deleteOnExit();
                BufferedWriter out = new BufferedWriter(new FileWriter(test));
                out.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
                out.write("<!DOCTYPE config SYSTEM
        \"org/knime/core/node/config/XMLConfig.dtd\">\n");
                out.write("<config key=\"root\">\n");
                for (int i = 0; i < 1000000; i++) {
                    out.write("<config key=\"" + i + "\">");
                    out.write("<entry key=\"datacell\" type=\"xstring\"
        value=\"org.knime.core.data.def.IntCell\"/>\n");
                    out.write("</config>\n");
                }
                out.write("</config>");
                out.close();
               
                // try to parse it
                SAXParserFactory factory = SAXParserFactory.newInstance();
                factory.setValidating(true);
                SAXParser parser = factory.newSAXParser();

                XMLReader reader = parser.getXMLReader();
                reader.setEntityResolver(RESOLVER);

                // java.lang.OutOfMemoryError: Java heap space, even with 256MB heap
                reader.parse(new InputSource(new FileInputStream(test)));
            }
        }
        ---------- END SOURCE ----------


        REPRODUCIBILITY :
        This bug can be reproduced always.

        Attachments

          Issue Links

            Activity

              People

                joehw Joe Wang
                ryeung Roger Yeung (Inactive)
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:
                  Imported:
                  Indexed: