Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6341770

Xerces cannot handle relative entity includes with non-ASCII base URL

XMLWordPrintable

    • b62
    • x86
    • linux, windows_xp

      I have a Fedora Core 4 Linux system which uses UTF-8 as the system locale. Consequently Java normally has no problems using non-ASCII characters in filenames (and neither does any other major software).

      However run this test case:

      ---%<---
      import java.io.File;
      import java.io.FileWriter;
      import java.io.PrintWriter;
      import javax.xml.parsers.SAXParserFactory;
      import org.xml.sax.Attributes;
      import org.xml.sax.SAXException;
      import org.xml.sax.helpers.DefaultHandler;
      public class Test {
          public static void main(String[] args) throws Exception {
              File dir = File.createTempFile("sko\u0159ice", null);
              dir.delete();
              dir.mkdir();
              File main = new File(dir, "main.xml");
              PrintWriter w = new PrintWriter(new FileWriter(main));
              w.println("<!DOCTYPE r [<!ENTITY aux SYSTEM \"aux.xml\">]>");
              w.println("<r>&aux;</r>");
              w.flush();
              w.close();
              File aux = new File(dir, "aux.xml");
              w = new PrintWriter(new FileWriter(aux));
              w.println("<x/>");
              w.flush();
              w.close();
              System.out.println("Parsing: " + main);
              SAXParserFactory.newInstance().newSAXParser().parse(main, new DefaultHandler() {
                  public void startElement(String uri, String localname, String qname, Attributes attr) throws SAXException {
                      System.out.println("encountered <" + qname + ">");
                  }
              });
              System.out.println("OK.");
          }
      }
      ---%<---

      On JDK 1.4.2 it works, on JDK 5.0+ it does not:

      ---%<---
      java version "1.4.2_09"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_09-b05)
      Java HotSpot(TM) Client VM (build 1.4.2_09-b05, mixed mode)

      Parsing: /tmp/sko<<<U+0159 LATIN SMALL LETTER R WITH CARON>>>ice17343.tmp/main.xml
      encountered <r>
      encountered <x>
      OK.
      ---%<---
      java version "1.5.0_05"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05)
      Java HotSpot(TM) Client VM (build 1.5.0_05-b05, mixed mode, sharing)

      Parsing: /tmp/sko<<<U+0159>>>ice42181.tmp/main.xml
      encountered <r>
      Exception in thread "main" java.net.MalformedURLException: no protocol: aux.xml
      at java.net.URL.<init>(URL.java:567)
      at java.net.URL.<init>(URL.java:464)
      at java.net.URL.<init>(URL.java:413)
      at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:968)
      at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:905)
      at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:843)
      at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1334)
      at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1756)
      at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
      at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
      at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
      at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
      at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
      at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
      at javax.xml.parsers.SAXParser.parse(SAXParser.java:311)
      at Test.main(Test.java:25)
      ---%<---
      java version "1.6.0-ea"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-ea-b57)
      Java HotSpot(TM) Client VM (build 1.6.0-ea-b57, mixed mode, sharing)

      Parsing: /tmp/sko<<<U+0159>>>ice26384.tmp/main.xml
      encountered <r>
      Exception in thread "main" java.net.MalformedURLException: no protocol: aux.xml
      at java.net.URL.<init>(URL.java:567)
      at java.net.URL.<init>(URL.java:464)
      at java.net.URL.<init>(URL.java:413)
      at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:657)
      at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1319)
      at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1256)
      at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1896)
      at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3019)
      at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:664)
      at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:524)
      at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
      at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
      at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
      at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
      at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
      at javax.xml.parsers.SAXParser.parse(SAXParser.java:376)
      at javax.xml.parsers.SAXParser.parse(SAXParser.java:312)
      at Test.main(Test.java:25)
      ---%<---

      Either (1) SAXParser.parse(File,...) is failing to take non-ASCII filenames and encode them as UTF-8 octets with %xx syntax, or (2) it is calling File.toURI which is supposed to be doing that but is not, and Crimson just did not check this condition; or (3) the non-ASCII character in the URI is OK and Xerces is incorrectly rejecting it. I suspect it is a combination of #1 and #2; there is another bug filed somewhere that File.toURI is not being called by JAXP, but even if it were, it seems that the result does not escape non-ASCII characters, which it seems it should if I read the RFC correctly.

            sreddysunw Sunitha Reddy (Inactive)
            jglick Jesse Glick (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: