-
Bug
-
Resolution: Unresolved
-
P4
-
None
-
22
-
x86_64
-
linux
A DESCRIPTION OF THE PROBLEM :
The newly-introduced in 22 jdk.xml.dtd.support property supports the value "ignore", and should successfully parse XML input with a DTD, ignoring the DTD.
The minimal grammatical DTD is one that has only the root element name, no externalID, and no internal subset. For example: <!DOCTYPE a>
A minimal grammatical XML document can therefore be: <!DOCTYPE a><a/>
Under the "ignore" setting for jdk.xml.dtd.support, that minimal document cannot be parsed using a SAXParser/XMLReader or using a DOM DocumentBuilder. It fails with NPE: Cannot invoke "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammar.isImmutable()" because the return value of "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket.getActiveGrammar()" is null
In contrast, the javax.xml.stream (StAX) parser, with jdk.xml.dtd.support set to "ignore", successfully parses the same input.
The SAX/DOM parser will successfully parse the input if the DTD is given either a dummy externalID (such as <!DOCTYPE a SYSTEM 'foo'>) or an empty internal subset (<!DOCTYPE a []>). The NPE is thrown only for the truly minimal DTD that has neither externalID nor internal subset.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
| Welcome to JShell -- Version 22
| For an introduction type: /help intro
// test using SAX API
String minimal = "<!DOCTYPE a><a/>";
var spf = javax.xml.parsers.SAXParserFactory.newDefaultInstance();
var sp = spf.newSAXParser();
sp.setProperty("jdk.xml.dtd.support", "ignore");
var is = new org.xml.sax.InputSource(new StringReader(minimal));
sp.getXMLReader().parse(is);
// test using DOM API
var dbf = javax.xml.parsers.DocumentBuilderFactory.newDefaultInstance();
dbf.setAttribute("jdk.xml.dtd.support", "ignore")
is = new org.xml.sax.InputSource(new StringReader(minimal));
dbf.newDocumentBuilder().parse(is)
// just for contrast, test using StAX API
var xif = javax.xml.stream.XMLInputFactory.newDefaultFactory();
xif.setProperty("jdk.xml.dtd.support", "ignore");
var xer = xif.createXMLEventReader(new StringReader(minimal));
xer.forEachRemaining(o -> System.out.println(o.toString()));
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
All three tests should succeed. (The SAX API test should consume the input silently, the DOM test should return a Document node as expected, and the StAX test should output the expected XMLEvent sequence.
ACTUAL -
// SAX API test (not ok):
| Exception java.lang.NullPointerException: Cannot invoke "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammar.isImmutable()" because the return value of "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket.getActiveGrammar()" is null
| at XMLDTDProcessor.startDTD (XMLDTDProcessor.java:637)
| at XMLDTDScannerImpl.setInputSource (XMLDTDScannerImpl.java:247)
| at XMLDocumentScannerImpl$PrologDriver.next (XMLDocumentScannerImpl.java:1001)
| at XMLDocumentScannerImpl.next (XMLDocumentScannerImpl.java:635)
| at XMLDocumentFragmentScannerImpl.scanDocument (XMLDocumentFragmentScannerImpl.java:551)
| at XML11Configuration.parse (XML11Configuration.java:890)
| at XML11Configuration.parse (XML11Configuration.java:826)
| at XMLParser.parse (XMLParser.java:134)
| at AbstractSAXParser.parse (AbstractSAXParser.java:1225)
| at SAXParserImpl$JAXPSAXParser.parse (SAXParserImpl.java:643)
// DOM API test (not ok):
| Exception java.lang.NullPointerException: Cannot invoke "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammar.isImmutable()" because the return value of "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket.getActiveGrammar()" is null
| at XMLDTDProcessor.startDTD (XMLDTDProcessor.java:637)
| at XMLDTDScannerImpl.setInputSource (XMLDTDScannerImpl.java:247)
| at XMLDocumentScannerImpl$PrologDriver.next (XMLDocumentScannerImpl.java:1001)
| at XMLDocumentScannerImpl.next (XMLDocumentScannerImpl.java:635)
| at XMLDocumentFragmentScannerImpl.scanDocument (XMLDocumentFragmentScannerImpl.java:551)
| at XML11Configuration.parse (XML11Configuration.java:890)
| at XML11Configuration.parse (XML11Configuration.java:826)
| at XMLParser.parse (XMLParser.java:134)
| at DOMParser.parse (DOMParser.java:247)
| at DocumentBuilderImpl.parse (DocumentBuilderImpl.java:342)
// StAX API test (ok):
<?xml version="null" encoding='null'?>
<!DOCTYPE a>
<a>
</a>
ENDDOCUMENT
CUSTOMER SUBMITTED WORKAROUND :
Avoid completely minimal DTDs (be sure to include at least an empty internal subset or a dummy externalID) when using the SAX or DOM APIs in "ignore" mode, or avoid the SAX and DOM APIs and use StAX, which does not exhibit the issue.
FREQUENCY : always
The newly-introduced in 22 jdk.xml.dtd.support property supports the value "ignore", and should successfully parse XML input with a DTD, ignoring the DTD.
The minimal grammatical DTD is one that has only the root element name, no externalID, and no internal subset. For example: <!DOCTYPE a>
A minimal grammatical XML document can therefore be: <!DOCTYPE a><a/>
Under the "ignore" setting for jdk.xml.dtd.support, that minimal document cannot be parsed using a SAXParser/XMLReader or using a DOM DocumentBuilder. It fails with NPE: Cannot invoke "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammar.isImmutable()" because the return value of "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket.getActiveGrammar()" is null
In contrast, the javax.xml.stream (StAX) parser, with jdk.xml.dtd.support set to "ignore", successfully parses the same input.
The SAX/DOM parser will successfully parse the input if the DTD is given either a dummy externalID (such as <!DOCTYPE a SYSTEM 'foo'>) or an empty internal subset (<!DOCTYPE a []>). The NPE is thrown only for the truly minimal DTD that has neither externalID nor internal subset.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
| Welcome to JShell -- Version 22
| For an introduction type: /help intro
// test using SAX API
String minimal = "<!DOCTYPE a><a/>";
var spf = javax.xml.parsers.SAXParserFactory.newDefaultInstance();
var sp = spf.newSAXParser();
sp.setProperty("jdk.xml.dtd.support", "ignore");
var is = new org.xml.sax.InputSource(new StringReader(minimal));
sp.getXMLReader().parse(is);
// test using DOM API
var dbf = javax.xml.parsers.DocumentBuilderFactory.newDefaultInstance();
dbf.setAttribute("jdk.xml.dtd.support", "ignore")
is = new org.xml.sax.InputSource(new StringReader(minimal));
dbf.newDocumentBuilder().parse(is)
// just for contrast, test using StAX API
var xif = javax.xml.stream.XMLInputFactory.newDefaultFactory();
xif.setProperty("jdk.xml.dtd.support", "ignore");
var xer = xif.createXMLEventReader(new StringReader(minimal));
xer.forEachRemaining(o -> System.out.println(o.toString()));
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
All three tests should succeed. (The SAX API test should consume the input silently, the DOM test should return a Document node as expected, and the StAX test should output the expected XMLEvent sequence.
ACTUAL -
// SAX API test (not ok):
| Exception java.lang.NullPointerException: Cannot invoke "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammar.isImmutable()" because the return value of "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket.getActiveGrammar()" is null
| at XMLDTDProcessor.startDTD (XMLDTDProcessor.java:637)
| at XMLDTDScannerImpl.setInputSource (XMLDTDScannerImpl.java:247)
| at XMLDocumentScannerImpl$PrologDriver.next (XMLDocumentScannerImpl.java:1001)
| at XMLDocumentScannerImpl.next (XMLDocumentScannerImpl.java:635)
| at XMLDocumentFragmentScannerImpl.scanDocument (XMLDocumentFragmentScannerImpl.java:551)
| at XML11Configuration.parse (XML11Configuration.java:890)
| at XML11Configuration.parse (XML11Configuration.java:826)
| at XMLParser.parse (XMLParser.java:134)
| at AbstractSAXParser.parse (AbstractSAXParser.java:1225)
| at SAXParserImpl$JAXPSAXParser.parse (SAXParserImpl.java:643)
// DOM API test (not ok):
| Exception java.lang.NullPointerException: Cannot invoke "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammar.isImmutable()" because the return value of "com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket.getActiveGrammar()" is null
| at XMLDTDProcessor.startDTD (XMLDTDProcessor.java:637)
| at XMLDTDScannerImpl.setInputSource (XMLDTDScannerImpl.java:247)
| at XMLDocumentScannerImpl$PrologDriver.next (XMLDocumentScannerImpl.java:1001)
| at XMLDocumentScannerImpl.next (XMLDocumentScannerImpl.java:635)
| at XMLDocumentFragmentScannerImpl.scanDocument (XMLDocumentFragmentScannerImpl.java:551)
| at XML11Configuration.parse (XML11Configuration.java:890)
| at XML11Configuration.parse (XML11Configuration.java:826)
| at XMLParser.parse (XMLParser.java:134)
| at DOMParser.parse (DOMParser.java:247)
| at DocumentBuilderImpl.parse (DocumentBuilderImpl.java:342)
// StAX API test (ok):
<?xml version="null" encoding='null'?>
<!DOCTYPE a>
<a>
</a>
ENDDOCUMENT
CUSTOMER SUBMITTED WORKAROUND :
Avoid completely minimal DTDs (be sure to include at least an empty internal subset or a dummy externalID) when using the SAX or DOM APIs in "ignore" mode, or avoid the SAX and DOM APIs and use StAX, which does not exhibit the issue.
FREQUENCY : always