Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8012541

Some XML 1.1 documents are not correctly handled by the DocumentBuilder API

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P3 P3
    • None
    • 6u43, 7u17
    • xml
    • Possibly generic.

      SYNOPSIS
      --------
      Some XML 1.1 documents are not correctly handled by the DocumentBuilder API

      OPERATING SYSTEM
      ----------------
      Windows 7 Professional x64

      FULL JDK VERSION(S)
      -------------------
      Reproduced both on :

      java version "1.6.0_43"
      Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
      Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)

      and

      java version "1.7.0_17"
      Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
      Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
      Please note it also occurs on x86 VMs.

      PROBLEM DESCRIPTION
      -------------------
      When parsing some XML documents that start with a XML 1.1 declaration using the DocumentBuilder API (javax.xml.parsers.DocumentBuilder), no exception is thrown but the resulting Document Object Model is corrupted : one or several nodes do not contain the right content.

      In the attached example, we can see that nodes get corrupted.

      REPRODUCTION INSTRUCTIONS
      -------------------------
      Run the attached DocumentBuilderCheck class. 2 examples are run successively, and an error message is printed to the console showing an error in each case.

      In the first example, we generate an XML document into a file, with a simple structure (<?xml version="1.1" encoding="UTF-8"?><main_tag><test>0000</test><test>0001</test>[...]<test>2499</test></main_tag>) and then we parse it and analyze the resulting Document object : we try
      and parse each "<test>" node into an integer. We then dump back the Document object to another XML file.

      The second example is the same as first one, except that we keep the generated XML document to be parsed as a String without dumping it to a file.

      Both examples show errors in the Document object. With JDK 1.7.0_17, the console output is :

      example #1 - ERROR: content 't>24' found at index 1926 cannnot be recognized as a valid number [For input string: "t>24"]
      example #2 - ERROR: content 't>14' found at index 964 cannnot be recognized as a valid number [For input string: "t>14"]
      example #2 - ERROR: content 't>46' found at index 1446 cannnot be recognized as a valid number [For input string: "t>46"]

      WORKAROUND
      ----------
      Generate XML 1.0 documents when possible (but some locales require XML 1.1), or rely on a third party library like a recent Xerces implementation.

      TESTCASE
      --------
      import java.io.File;
      import java.io.PrintWriter;
      import java.io.StringReader;

      import javax.xml.parsers.DocumentBuilder;
      import javax.xml.parsers.DocumentBuilderFactory;
      import javax.xml.transform.OutputKeys;
      import javax.xml.transform.Transformer;
      import javax.xml.transform.TransformerFactory;
      import javax.xml.transform.dom.DOMSource;
      import javax.xml.transform.stream.StreamResult;

      import org.w3c.dom.Document;
      import org.w3c.dom.NodeList;
      import org.xml.sax.InputSource;

      public class DocumentBuilderCheck {
          public static void main(String[] args) throws Exception {
              // Example 1
              // generating a simple XML document directly into a file
              String filename = "Example1_DocumentToParse.xml";
              generateXmlFile(filename, 2500);

              // parsing the document using DocumentBuilder
              Document doc2 = readXmlFile(filename);

              // analyzing the resulting document
              analyzeDocumentValidity("example #1", doc2);

              // dumping the parsed document to file
              String filename2 = "Example1_DocumentParsed.xml";
              writeDocument(doc2, filename2);


              // Example 2
              // generating a simple XML document as a string
              String xmlDoc = generateXMLDocument(2500);

              // parsing the document using DocumentBuilder
              Document doc = readXmlDocument(xmlDoc);

              // analyzing the resulting document
              analyzeDocumentValidity("example #2", doc);

              // dumping the parsed document to file
              writeDocument(doc, "Example2_DocumentParsed.xml");
          }

          private static void analyzeDocumentValidity(String testName, Document doc) {
              // analyzing the content of the parsed structure,
              // checking that it matches the original document
              NodeList nodes = doc.getDocumentElement().getChildNodes();
              for (int k=0;k<nodes.getLength();k++) {
                  String nodeContent = nodes.item(k).getTextContent();

                  // checking node content ("<test>" tag)
                  try {
                      // if parsing is incorrect, either we get an exception here (if content has been corrupted and )
                      int nb = Integer.parseInt(nodeContent);
                      if ( nb != k ) {
                          System.out.println(testName + " - ERROR : number at index "+k+" is not the expected one ("+nb+" instead of "+k+")");
                      }
                  } catch (NumberFormatException ex) {
                      System.out.println(testName + " - ERROR: content '"+nodeContent+"' found at index "+k+" cannnot be recognized as a valid number ["+ex.getMessage()+"]");
                  }
              }
          }

          private static void writeDocument(Document document, String filename) throws Exception {
              StreamResult streamResult = new StreamResult(filename);
              TransformerFactory transformerFactory = TransformerFactory.newInstance();
              Transformer transformer = transformerFactory.newTransformer();
              transformer.setOutputProperty(OutputKeys.INDENT, "yes");
              transformer.setOutputProperty(OutputKeys.METHOD, "xml");
              transformer.transform(new DOMSource(document), streamResult);
          }

          private static Document readXmlFile(String filename) throws Exception {
              DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
              dbf.setNamespaceAware(true);
              DocumentBuilder db = dbf.newDocumentBuilder();
              Document doc = db.parse(new File(filename));
              return doc;
          }

          private static Document readXmlDocument(String document) throws Exception {
              DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
              dbf.setNamespaceAware(true);
              DocumentBuilder db = dbf.newDocumentBuilder();
              InputSource is = new InputSource();
              is.setCharacterStream(new StringReader(document));
              Document doc = db.parse(is);
              return doc;
          }

          private static void generateXmlFile(String filename, int total)
          throws Exception {
              File f = new File(filename);

              PrintWriter pw = new PrintWriter(f);
              pw.write("<?xml version=\"1.1\" encoding=\"UTF-8\"?>");
              pw.write("<main_tag>");
              for (int i = 0; i < total; i++) {
                  pw.write("<test>" + String.format("%04d", i) + "</test>");
              }
              pw.write("</main_tag>");
              pw.close();
          }

          private static String generateXMLDocument(int total){
              StringBuffer sb = new StringBuffer();
              sb.append("<?xml version=\"1.1\" encoding=\"UTF-8\"?>");
              sb.append("<main_tag>");
              for (int i = 0; i < total; i++) {
                  sb.append("<test>" + String.format("%04d", i) + "</test>");
              }
              sb.append("</main_tag>");
              return sb.toString();
          }
      }

            joehw Joe Wang
            dkorbel David Korbel (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: