Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8329295

SAX/DOM parsing with jdk.xml.dtd.support=ignore fails for minimal DTD

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • 22
    • xml

      A DESCRIPTION OF THE PROBLEM :
      The newly-introduced in 22 jdk.xml.dtd.support property supports the value "ignore", and should successfully parse XML input with a DTD, ignoring the DTD.

      The minimal grammatical DTD is one that has only the root element name, no externalID, and no internal subset. For example: <!DOCTYPE a>

      A minimal grammatical XML document can therefore be: <!DOCTYPE a><a/>

      Under the "ignore" setting for jdk.xml.dtd.support, that minimal document cannot be parsed using a SAXParser/XMLReader or using a DOM DocumentBuilder. It fails with NPE: Cannot invoke "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammar.isImmutable()" because the return value of "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket.getActiveGrammar()" is null

      In contrast, the javax.xml.stream (StAX) parser, with jdk.xml.dtd.support set to "ignore", successfully parses the same input.

      The SAX/DOM parser will successfully parse the input if the DTD is given either a dummy externalID (such as <!DOCTYPE a SYSTEM 'foo'>) or an empty internal subset (<!DOCTYPE a []>). The NPE is thrown only for the truly minimal DTD that has neither externalID nor internal subset.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      | Welcome to JShell -- Version 22
      | For an introduction type: /help intro
      // test using SAX API
      String minimal = "<!DOCTYPE a><a/>";
      var spf = javax.xml.parsers.SAXParserFactory.newDefaultInstance();
      var sp = spf.newSAXParser();
      sp.setProperty("jdk.xml.dtd.support", "ignore");
      var is = new org.xml.sax.InputSource(new StringReader(minimal));
      sp.getXMLReader().parse(is);

      // test using DOM API
      var dbf = javax.xml.parsers.DocumentBuilderFactory.newDefaultInstance();
      dbf.setAttribute("jdk.xml.dtd.support", "ignore")
      is = new org.xml.sax.InputSource(new StringReader(minimal));
      dbf.newDocumentBuilder().parse(is)

      // just for contrast, test using StAX API
      var xif = javax.xml.stream.XMLInputFactory.newDefaultFactory();
      xif.setProperty("jdk.xml.dtd.support", "ignore");
      var xer = xif.createXMLEventReader(new StringReader(minimal));
      xer.forEachRemaining(o -> System.out.println(o.toString()));

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      All three tests should succeed. (The SAX API test should consume the input silently, the DOM test should return a Document node as expected, and the StAX test should output the expected XMLEvent sequence.
      ACTUAL -
      // SAX API test (not ok):
      | Exception java.lang.NullPointerException: Cannot invoke "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammar.isImmutable()" because the return value of "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket.getActiveGrammar()" is null
      | at XMLDTDProcessor.startDTD (XMLDTDProcessor.java:637)
      | at XMLDTDScannerImpl.setInputSource (XMLDTDScannerImpl.java:247)
      | at XMLDocumentScannerImpl$PrologDriver.next (XMLDocumentScannerImpl.java:1001)
      | at XMLDocumentScannerImpl.next (XMLDocumentScannerImpl.java:635)
      | at XMLDocumentFragmentScannerImpl.scanDocument (XMLDocumentFragmentScannerImpl.java:551)
      | at XML11Configuration.parse (XML11Configuration.java:890)
      | at XML11Configuration.parse (XML11Configuration.java:826)
      | at XMLParser.parse (XMLParser.java:134)
      | at AbstractSAXParser.parse (AbstractSAXParser.java:1225)
      | at SAXParserImpl$JAXPSAXParser.parse (SAXParserImpl.java:643)

      // DOM API test (not ok):
      | Exception java.lang.NullPointerException: Cannot invoke "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammar.isImmutable()" because the return value of "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket.getActiveGrammar()" is null
      | at XMLDTDProcessor.startDTD (XMLDTDProcessor.java:637)
      | at XMLDTDScannerImpl.setInputSource (XMLDTDScannerImpl.java:247)
      | at XMLDocumentScannerImpl$PrologDriver.next (XMLDocumentScannerImpl.java:1001)
      | at XMLDocumentScannerImpl.next (XMLDocumentScannerImpl.java:635)
      | at XMLDocumentFragmentScannerImpl.scanDocument (XMLDocumentFragmentScannerImpl.java:551)
      | at XML11Configuration.parse (XML11Configuration.java:890)
      | at XML11Configuration.parse (XML11Configuration.java:826)
      | at XMLParser.parse (XMLParser.java:134)
      | at DOMParser.parse (DOMParser.java:247)
      | at DocumentBuilderImpl.parse (DocumentBuilderImpl.java:342)

      // StAX API test (ok):
      <?xml version="null" encoding='null'?>
      <!DOCTYPE a>
      <a>
      </a>
      ENDDOCUMENT


      CUSTOMER SUBMITTED WORKAROUND :
      Avoid completely minimal DTDs (be sure to include at least an empty internal subset or a dummy externalID) when using the SAX or DOM APIs in "ignore" mode, or avoid the SAX and DOM APIs and use StAX, which does not exhibit the issue.

      FREQUENCY : always


            joehw Joe Wang
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: