Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8305253

Broken org.xml.sax.ext.EntityResolver2.getExternalSubset() behavior

    XMLWordPrintable

Details

    Description

      ADDITIONAL SYSTEM INFORMATION :
      Tested on Windows but the OS is likely irrelevant. Reproduced with Java 8, 11, 17, and current 21 early access.

      A DESCRIPTION OF THE PROBLEM :
      The EntityResolver2.getExternalSubset() [1] apidoc states:

      > Allows applications to provide an external subset for documents that
      > don't explicitly define one. Documents with DOCTYPE declarations that
      > omit an external subset can thus augment the declarations available
      > for validation, entity processing, and attribute processing
      > (normalization, defaulting, and reporting types including ID).
      >
      > This method can also be used with documents that have no DOCTYPE
      > declaration. When the root element is encountered, but no DOCTYPE
      > declaration has been seen, this method is invoked.

      However, it doesn't appear to be the case with documents that have no DOCTYPE declaration. Additionally, for documents that have a DOCTYPE declaration and specify an internal subset, the supplied external subset appears to be ignored.

      I've found an existing:

      * https://bugs.openjdk.org/browse/JDK-6524460 "EntityResolver2 no longer used in Java6"

      It describes the first case – getExternalSubset() never invoked for documents that have no DOCTYPE declaration. I've also identified a conflict with the internal subset when defined. FWIW, both of these cases work as expected using the original Xerces2 implementation.

      [1]: https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/org/xml/sax/ext/EntityResolver2.html#getExternalSubset(java.lang.String,java.lang.String)

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      > java SAXParserProvidedExternalSubsetTest
      > java DOMBuilderProvidedExternalSubsetTest


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Output (no errors):

      doctype-no-ext-subset.xml
      no-doctype-decl.xml
      doctype-int-subset.xml
      ACTUAL -
      > java SAXParserProvidedExternalSubsetTest
      doctype-no-ext-subset.xml
      no-doctype-decl.xml
      org.xml.sax.SAXParseException; systemId: file:/.../no-doctype-decl.xml; lineNumber: 4; columnNumber: 13; The entity "nbsp" was referenced, but not declared.
              at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1243)
              at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
              at SAXParserProvidedExternalSubsetTest.testExternalSubsetRequested(SAXParserProvidedExternalSubsetTest.java:47)
              at SAXParserProvidedExternalSubsetTest.testNoDoctype(SAXParserProvidedExternalSubsetTest.java:30)
              at SAXParserProvidedExternalSubsetTest.run(SAXParserProvidedExternalSubsetTest.java:86)
              at SAXParserProvidedExternalSubsetTest.main(SAXParserProvidedExternalSubsetTest.java:76)
      doctype-int-subset.xml
      java.lang.AssertionError: Expected text content: "Foo   Bar™", but was: "Foo Bar™"
              at SAXParserProvidedExternalSubsetTest.assertEquals(SAXParserProvidedExternalSubsetTest.java:66)
              at SAXParserProvidedExternalSubsetTest.testExternalSubsetRequested(SAXParserProvidedExternalSubsetTest.java:51)
              at SAXParserProvidedExternalSubsetTest.testDoctypeWithIntSubset(SAXParserProvidedExternalSubsetTest.java:34)
              at SAXParserProvidedExternalSubsetTest.run(SAXParserProvidedExternalSubsetTest.java:86)
              at SAXParserProvidedExternalSubsetTest.main(SAXParserProvidedExternalSubsetTest.java:77)

      > java DOMBuilderProvidedExternalSubsetTest
      doctype-no-ext-subset.xml
      no-doctype-decl.xml
      org.xml.sax.SAXParseException; systemId: file:/.../no-doctype-decl.xml; lineNumber: 4; columnNumber: 13; The entity "nbsp" was referenced, but not declared.
              at java.xml/com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:261)
              at java.xml/com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
              at DOMBuilderProvidedExternalSubsetTest.testExternalSubsetRequested(DOMBuilderProvidedExternalSubsetTest.java:49)
              at DOMBuilderProvidedExternalSubsetTest.testNoDoctype(DOMBuilderProvidedExternalSubsetTest.java:33)
              at DOMBuilderProvidedExternalSubsetTest.run(DOMBuilderProvidedExternalSubsetTest.java:87)
              at DOMBuilderProvidedExternalSubsetTest.main(DOMBuilderProvidedExternalSubsetTest.java:77)
      doctype-int-subset.xml
      java.lang.AssertionError: Expected text content: "Foo   Bar™", but was: ""
              at DOMBuilderProvidedExternalSubsetTest.assertEquals(DOMBuilderProvidedExternalSubsetTest.java:67)
              at DOMBuilderProvidedExternalSubsetTest.testExternalSubsetRequested(DOMBuilderProvidedExternalSubsetTest.java:52)
              at DOMBuilderProvidedExternalSubsetTest.testDoctypeWithIntSubset(DOMBuilderProvidedExternalSubsetTest.java:37)
              at DOMBuilderProvidedExternalSubsetTest.run(DOMBuilderProvidedExternalSubsetTest.java:87)
              at DOMBuilderProvidedExternalSubsetTest.main(DOMBuilderProvidedExternalSubsetTest.java:78)


      ---------- BEGIN SOURCE ----------
      ---doctype-no-ext-subset.xml
      <!DOCTYPE html>
      <html>
      <body>
        Foo &nbsp; Bar&#x2122;
      </body>
      </html>
      ---doctype-no-ext-subset.xml--

      ---no-doctype-decl.xml
      <?xml version="1.0" encoding="UTF-8"?>
      <html xmlns="http://www.w3.org/1999/xhtml">
      <body>
        Foo &nbsp; Bar&#x2122;
      </body>
      </html>
      ---no-doctype-decl.xml--

      ---doctype-int-subset.xml
      <!DOCTYPE html [
        <!ENTITY trade "&#x2122;">
      ]>
      <html>
      <body>
        Foo &nbsp; Bar&trade;
      </body>
      </html>
      ---doctype-int-subset.xml--

      ---SAXParserProvidedExternalSubsetTest.java
      //package ;

      import java.io.FileNotFoundException;
      import java.io.IOException;
      import java.io.StringReader;
      import java.net.URL;
      import java.util.Objects;

      import javax.xml.parsers.SAXParserFactory;
      import org.xml.sax.InputSource;
      import org.xml.sax.SAXException;
      import org.xml.sax.SAXParseException;
      import org.xml.sax.XMLReader;
      import org.xml.sax.ext.DefaultHandler2;

      public class SAXParserProvidedExternalSubsetTest {

          private SAXParserFactory parserFactory;

          public void setUp() throws Exception {
              parserFactory = SAXParserFactory.newInstance();
              parserFactory.setNamespaceAware(true);
          }

          public void testDoctypeNoExternalSubset() throws Exception {
              testExternalSubsetRequested("doctype-no-ext-subset.xml");
          }

          public void testNoDoctype() throws Exception {
              testExternalSubsetRequested("no-doctype-decl.xml");
          }

          public void testDoctypeWithIntSubset() throws Exception {
              testExternalSubsetRequested("doctype-int-subset.xml");
          }

          private void testExternalSubsetRequested(String resource) throws Exception {
              System.out.println(resource);

              XMLReader xmlReader = parserFactory.newSAXParser().getXMLReader();
              TestHandler testHandler = new TestHandler();
              xmlReader.setErrorHandler(testHandler);
              xmlReader.setEntityResolver(testHandler);
              xmlReader.setContentHandler(testHandler);

              InputSource source = new InputSource(getResource(resource).toString());
              xmlReader.parse(source);

              assertEquals("requested external subset",
                      testHandler.requestedExternalSubset, "html");
              assertEquals("text content",
                      testHandler.textContent.toString().trim(), "Foo \u00A0 Bar\u2122");
          }

          private static URL getResource(String name) throws IOException {
              URL resource = SAXParserProvidedExternalSubsetTest.class.getResource(name);
              if (resource == null) {
                  throw new FileNotFoundException("Resource not found: " + name);
              }
              return resource;
          }

          private static void assertEquals(String subject, String actual, String expected)
                  throws AssertionError {
              if (!Objects.equals(actual, expected)) {
                  throw new AssertionError("Expected " + subject
                          + ": \"" + expected + "\", but was: \"" + actual + "\"");
              }
          }

          public static void main(String[] args) throws Exception {
              SAXParserProvidedExternalSubsetTest
                      suite = new SAXParserProvidedExternalSubsetTest();
              suite.setUp();
              if (run(suite::testDoctypeNoExternalSubset)
                      & run(suite::testNoDoctype)
                      & run(suite::testDoctypeWithIntSubset)) {
                  // success
              } else {
                  System.exit(1);
              }
          }

          private static boolean run(ThrowingRunnable testCase) {
              try {
                  testCase.run();
                  return true;
              } catch (Throwable e) {
                  e.printStackTrace();
                  return false;
              }
          }


          @FunctionalInterface
          static interface ThrowingRunnable {
              void run() throws Throwable;
          }


          static class TestHandler extends DefaultHandler2 {

              String requestedExternalSubset;

              StringBuilder textContent = new StringBuilder();

              private static InputSource htmlEntities() {
                  return new InputSource(new StringReader("<!ENTITY nbsp '&#xA0;'>"));
              }

              @Override
              public InputSource getExternalSubset(String name, String baseURI)
                      throws SAXException, IOException {
                  requestedExternalSubset = name;
                  return "html".equals(name) ? htmlEntities() : null;
              }

              @Override
              public void characters(char[] ch, int start, int length)
                      throws SAXException {
                  textContent.append(ch, start, length);
              }

              @Override
              public void warning(SAXParseException e) throws SAXException {
                  System.err.append("[warning] ").println(e);
              }

              @Override
              public void error(SAXParseException e) throws SAXException {
                  System.err.append("[error] ").println(e);
              }

              @Override
              public void fatalError(SAXParseException e) throws SAXException {
                  // Parser should signal an exception in any case
              }

          } // class TestHandler


      } // class SAXParserProvidedExternalSubsetTest
      ---SAXParserProvidedExternalSubsetTest.java--

      ---DOMBuilderProvidedExternalSubsetTest.java
      //package ;

      import java.io.FileNotFoundException;
      import java.io.IOException;
      import java.io.StringReader;
      import java.net.URL;
      import java.util.Objects;

      import javax.xml.parsers.DocumentBuilder;
      import javax.xml.parsers.DocumentBuilderFactory;
      import org.w3c.dom.Document;
      import org.w3c.dom.Node;
      import org.xml.sax.InputSource;
      import org.xml.sax.SAXException;
      import org.xml.sax.SAXParseException;
      import org.xml.sax.ext.DefaultHandler2;

      public class DOMBuilderProvidedExternalSubsetTest {

          private DocumentBuilderFactory parserFactory;

          public void setUp() throws Exception {
              parserFactory = DocumentBuilderFactory.newInstance();
              parserFactory.setNamespaceAware(true);
              //parserFactory.setExpandEntityReferences(false);
          }

          public void testDoctypeNoExternalSubset() throws Exception {
              testExternalSubsetRequested("doctype-no-ext-subset.xml");
          }

          public void testNoDoctype() throws Exception {
              testExternalSubsetRequested("no-doctype-decl.xml");
          }

          public void testDoctypeWithIntSubset() throws Exception {
              testExternalSubsetRequested("doctype-int-subset.xml");
          }

          private void testExternalSubsetRequested(String resource) throws Exception {
              System.out.println(resource);

              DocumentBuilder domBuilder = parserFactory.newDocumentBuilder();
              TestHandler testHandler = new TestHandler();
              domBuilder.setErrorHandler(testHandler);
              domBuilder.setEntityResolver(testHandler);

              InputSource source = new InputSource(getResource(resource).toString());
              Document document = domBuilder.parse(source);

              Node body = document.getElementsByTagName("body").item(0);
              assertEquals("text content",
                      body.getTextContent().trim(), "Foo \u00A0 Bar\u2122");
          }

          private static URL getResource(String name) throws IOException {
              URL resource = DOMBuilderProvidedExternalSubsetTest.class.getResource(name);
              if (resource == null) {
                  throw new FileNotFoundException("Resource not found: " + name);
              }
              return resource;
          }

          private static void assertEquals(String subject, String actual, String expected)
                  throws AssertionError {
              if (!Objects.equals(actual, expected)) {
                  throw new AssertionError("Expected " + subject
                          + ": \"" + expected + "\", but was: \"" + actual + "\"");
              }
          }

          public static void main(String[] args) throws Exception {
              DOMBuilderProvidedExternalSubsetTest
                      suite = new DOMBuilderProvidedExternalSubsetTest();
              suite.setUp();
              if (run(suite::testDoctypeNoExternalSubset)
                      & run(suite::testNoDoctype)
                      & run(suite::testDoctypeWithIntSubset)) {
                  // success
              } else {
                  System.exit(1);
              }
          }

          private static boolean run(ThrowingRunnable testCase) {
              try {
                  testCase.run();
                  return true;
              } catch (Throwable e) {
                  e.printStackTrace();
                  return false;
              }
          }


          @FunctionalInterface
          static interface ThrowingRunnable {
              void run() throws Throwable;
          }


          static class TestHandler extends DefaultHandler2 {

              private static InputSource htmlEntities() {
                  return new InputSource(new StringReader("<!ENTITY nbsp '&#xA0;'>"));
              }

              @Override
              public InputSource getExternalSubset(String name, String baseURI)
                      throws SAXException, IOException {
                  return "html".equals(name) ? htmlEntities() : null;
              }

              @Override
              public void warning(SAXParseException e) throws SAXException {
                  System.err.append("[warning] ").println(e);
              }

              @Override
              public void error(SAXParseException e) throws SAXException {
                  System.err.append("[error] ").println(e);
              }

              @Override
              public void fatalError(SAXParseException e) throws SAXException {
                  // Parser should signal an exception in any case
              }

          } // class TestHandler


      } // class DOMBuilderProvidedExternalSubsetTest
      ---DOMBuilderProvidedExternalSubsetTest.java--

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Possibly add the original Xerces2 implementation to the runtime.

      FREQUENCY : always


      Attachments

        Issue Links

          Activity

            People

              joehw Joe Wang
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: