-
Bug
-
Resolution: Fixed
-
P3
-
5.0
-
b62
-
x86
-
linux, windows_xp
I have a Fedora Core 4 Linux system which uses UTF-8 as the system locale. Consequently Java normally has no problems using non-ASCII characters in filenames (and neither does any other major software).
However run this test case:
---%<---
import java.io.File;
import java.io.FileWriter;
import java.io.PrintWriter;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Test {
public static void main(String[] args) throws Exception {
File dir = File.createTempFile("sko\u0159ice", null);
dir.delete();
dir.mkdir();
File main = new File(dir, "main.xml");
PrintWriter w = new PrintWriter(new FileWriter(main));
w.println("<!DOCTYPE r [<!ENTITY aux SYSTEM \"aux.xml\">]>");
w.println("<r>&aux;</r>");
w.flush();
w.close();
File aux = new File(dir, "aux.xml");
w = new PrintWriter(new FileWriter(aux));
w.println("<x/>");
w.flush();
w.close();
System.out.println("Parsing: " + main);
SAXParserFactory.newInstance().newSAXParser().parse(main, new DefaultHandler() {
public void startElement(String uri, String localname, String qname, Attributes attr) throws SAXException {
System.out.println("encountered <" + qname + ">");
}
});
System.out.println("OK.");
}
}
---%<---
On JDK 1.4.2 it works, on JDK 5.0+ it does not:
---%<---
java version "1.4.2_09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_09-b05)
Java HotSpot(TM) Client VM (build 1.4.2_09-b05, mixed mode)
Parsing: /tmp/sko<<<U+0159 LATIN SMALL LETTER R WITH CARON>>>ice17343.tmp/main.xml
encountered <r>
encountered <x>
OK.
---%<---
java version "1.5.0_05"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05)
Java HotSpot(TM) Client VM (build 1.5.0_05-b05, mixed mode, sharing)
Parsing: /tmp/sko<<<U+0159>>>ice42181.tmp/main.xml
encountered <r>
Exception in thread "main" java.net.MalformedURLException: no protocol: aux.xml
at java.net.URL.<init>(URL.java:567)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:968)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:905)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:843)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1334)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1756)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:311)
at Test.main(Test.java:25)
---%<---
java version "1.6.0-ea"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-ea-b57)
Java HotSpot(TM) Client VM (build 1.6.0-ea-b57, mixed mode, sharing)
Parsing: /tmp/sko<<<U+0159>>>ice26384.tmp/main.xml
encountered <r>
Exception in thread "main" java.net.MalformedURLException: no protocol: aux.xml
at java.net.URL.<init>(URL.java:567)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:657)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1319)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1256)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1896)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3019)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:664)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:524)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:376)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:312)
at Test.main(Test.java:25)
---%<---
Either (1) SAXParser.parse(File,...) is failing to take non-ASCII filenames and encode them as UTF-8 octets with %xx syntax, or (2) it is calling File.toURI which is supposed to be doing that but is not, and Crimson just did not check this condition; or (3) the non-ASCII character in the URI is OK and Xerces is incorrectly rejecting it. I suspect it is a combination of #1 and #2; there is another bug filed somewhere that File.toURI is not being called by JAXP, but even if it were, it seems that the result does not escape non-ASCII characters, which it seems it should if I read the RFC correctly.
However run this test case:
---%<---
import java.io.File;
import java.io.FileWriter;
import java.io.PrintWriter;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Test {
public static void main(String[] args) throws Exception {
File dir = File.createTempFile("sko\u0159ice", null);
dir.delete();
dir.mkdir();
File main = new File(dir, "main.xml");
PrintWriter w = new PrintWriter(new FileWriter(main));
w.println("<!DOCTYPE r [<!ENTITY aux SYSTEM \"aux.xml\">]>");
w.println("<r>&aux;</r>");
w.flush();
w.close();
File aux = new File(dir, "aux.xml");
w = new PrintWriter(new FileWriter(aux));
w.println("<x/>");
w.flush();
w.close();
System.out.println("Parsing: " + main);
SAXParserFactory.newInstance().newSAXParser().parse(main, new DefaultHandler() {
public void startElement(String uri, String localname, String qname, Attributes attr) throws SAXException {
System.out.println("encountered <" + qname + ">");
}
});
System.out.println("OK.");
}
}
---%<---
On JDK 1.4.2 it works, on JDK 5.0+ it does not:
---%<---
java version "1.4.2_09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_09-b05)
Java HotSpot(TM) Client VM (build 1.4.2_09-b05, mixed mode)
Parsing: /tmp/sko<<<U+0159 LATIN SMALL LETTER R WITH CARON>>>ice17343.tmp/main.xml
encountered <r>
encountered <x>
OK.
---%<---
java version "1.5.0_05"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05)
Java HotSpot(TM) Client VM (build 1.5.0_05-b05, mixed mode, sharing)
Parsing: /tmp/sko<<<U+0159>>>ice42181.tmp/main.xml
encountered <r>
Exception in thread "main" java.net.MalformedURLException: no protocol: aux.xml
at java.net.URL.<init>(URL.java:567)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:968)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:905)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:843)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1334)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1756)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:311)
at Test.main(Test.java:25)
---%<---
java version "1.6.0-ea"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-ea-b57)
Java HotSpot(TM) Client VM (build 1.6.0-ea-b57, mixed mode, sharing)
Parsing: /tmp/sko<<<U+0159>>>ice26384.tmp/main.xml
encountered <r>
Exception in thread "main" java.net.MalformedURLException: no protocol: aux.xml
at java.net.URL.<init>(URL.java:567)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:657)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1319)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1256)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1896)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3019)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:664)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:524)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:376)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:312)
at Test.main(Test.java:25)
---%<---
Either (1) SAXParser.parse(File,...) is failing to take non-ASCII filenames and encode them as UTF-8 octets with %xx syntax, or (2) it is calling File.toURI which is supposed to be doing that but is not, and Crimson just did not check this condition; or (3) the non-ASCII character in the URI is OK and Xerces is incorrectly rejecting it. I suspect it is a combination of #1 and #2; there is another bug filed somewhere that File.toURI is not being called by JAXP, but even if it were, it seems that the result does not escape non-ASCII characters, which it seems it should if I read the RFC correctly.
- relates to
-
JDK-6992561 Encoding of SystemId in Locator in JDK 6
-
- Closed
-