Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2075224 | 1.3.0 | Venugopal K | P3 | Closed | Fixed | 1.3 |
Name: apR10229 Date: 09/29/2003
Filed By : SPB JCK team (###@###.###)
JDK : java full version "1.5.0-beta-b20"
JCK : 1.5
Platform[s] : Linux
switch/Mode :
JCK test owner : http://javaweb.eng/jct/sqe/JCK-tck/usr/owners.jto
Failing Test [s] : N/A
Specification excerpt:
======================
--------- J2SE API spec v.1.5 ---------
...
public abstract class Charset extends Object implements Comparable
....
Standard charsets
Every implementation of the Java platform is required to support the following standard charsets. Consult the release documentation for your implementation to see if any other charsets are supported.
Charset
Description
US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8 Eight-bit UCS Transformation Format
UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order
UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order
UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark
The UTF-8 charset is specified by RFC 2279; the transformation format upon which it is based is specified in Amendment 2 of ISO 10646-1 and is also described in ? 3.8 of The Unicode Standard, Version 3.0 (amended).
The UTF-16 charsets are specified by RFC 2781; the transformation formats upon which they are based are specified in Amendment 1 of ISO 10646-1 and are also described in ? 3.8 of The Unicode Standard, Version 3.0.
The UTF-16 charsets use sixteen-bit quantities and are therefore sensitive to byte order. In these encodings the byte order of a stream may be indicated by an initial byte-order mark represented by the Unicode character '\uFEFF'. Byte-order marks are handled as follows:
*
When decoding, the UTF-16BE and UTF-16LE charsets ignore byte-order marks; when encoding, they do not write byte-order marks.
*
When decoding, the UTF-16 charset interprets a byte-order mark to indicate the byte order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.
In any case, when a byte-order mark is read at the beginning of a decoding operation it is omitted from the resulting sequence of characters. Byte order marks occuring after the first element of an input sequence are not omitted since the same code is used to represent ZERO-WIDTH NON-BREAKING SPACE.
Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets. The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.
...
---------- end-of-excerpt ---------------
Problem description
===================
Method org.w3c.dom.ls.DOMParser.parse(DOMInput src) crashes with java.lang.InternalError when trying to parse UTF16-encoded input source. The API documentation contains no any limitations for base encodings (supported by java platform), used to encode input source, so this behavior seems to be abnormal.
Minimized test:
===============
------- Test.java -------
import java.io.*;
import org.w3c.dom.ls.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
public class Test {
public static void main(String[] argv) {
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><encodingXML/>";
Document doc = null;
try {
DocumentBuilder parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
doc = parser.parse(new StringBufferInputStream(xml));
} catch (Throwable e) {
e.printStackTrace();
}
DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS","3.0");
DOMInput src = implLS.createDOMInput();
try {
src.setByteStream(new ByteArrayInputStream(xml.getBytes("UTF-16")));
} catch(UnsupportedEncodingException e) {
e.printStackTrace();
}
src.setEncoding("UTF-16");
DOMParser dp = implLS.createDOMParser(DOMImplementationLS.MODE_SYNCHRONOUS,"http://www.w3.org/2001/XMLSchema");
Document parsedXML = dp.parse(src);
}
}
------- end-of-Test.java -------
Minimized test output:
======================
<pav@hammer(pts/5).265> java Test
Exception in thread "main" java.lang.InternalError: Converter malfunction (Unicode) -- please submit a bug report via http://java.sun.com/cgi-bin/bugreport.cgi
at sun.nio.cs.StreamDecoder$ConverterSD.malfunction(StreamDecoder.java:235)
at sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:251)
at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:297)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:182)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1643)
at com.sun.org.apache.xerces.internal.impl.XML11EntityScanner.skipString(XML11EntityScanner.java:1018)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:188)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:188)
at com.sun.org.apache.xerces.internal.parsers.DTDConfiguration.parse(DTDConfiguration.java:593)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
at com.sun.org.apache.xerces.internal.parsers.DOMParserImpl.parse(DOMParserImpl.java:755)
at Test.main(Test.java:26)
<pav@hammer(pts/5).266>
JCK test source location:
==========================
/java/re/jck/1.5/promoted/latest/JCK-runtime-15/tests
Specific Machine Info:
=====================
Linux hammer 2.4.21 #1 Wed Jun 25 20:18:22 MSD 2003 i686 unknown
======================================================================
###@###.### 2003-10-04
###@###.### 2003-11-24
Filed By : SPB JCK team (###@###.###)
JDK : java full version "1.5.0-beta-b20"
JCK : 1.5
Platform[s] : Linux
switch/Mode :
JCK test owner : http://javaweb.eng/jct/sqe/JCK-tck/usr/owners.jto
Failing Test [s] : N/A
Specification excerpt:
======================
--------- J2SE API spec v.1.5 ---------
...
public abstract class Charset extends Object implements Comparable
....
Standard charsets
Every implementation of the Java platform is required to support the following standard charsets. Consult the release documentation for your implementation to see if any other charsets are supported.
Charset
Description
US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8 Eight-bit UCS Transformation Format
UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order
UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order
UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark
The UTF-8 charset is specified by RFC 2279; the transformation format upon which it is based is specified in Amendment 2 of ISO 10646-1 and is also described in ? 3.8 of The Unicode Standard, Version 3.0 (amended).
The UTF-16 charsets are specified by RFC 2781; the transformation formats upon which they are based are specified in Amendment 1 of ISO 10646-1 and are also described in ? 3.8 of The Unicode Standard, Version 3.0.
The UTF-16 charsets use sixteen-bit quantities and are therefore sensitive to byte order. In these encodings the byte order of a stream may be indicated by an initial byte-order mark represented by the Unicode character '\uFEFF'. Byte-order marks are handled as follows:
*
When decoding, the UTF-16BE and UTF-16LE charsets ignore byte-order marks; when encoding, they do not write byte-order marks.
*
When decoding, the UTF-16 charset interprets a byte-order mark to indicate the byte order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.
In any case, when a byte-order mark is read at the beginning of a decoding operation it is omitted from the resulting sequence of characters. Byte order marks occuring after the first element of an input sequence are not omitted since the same code is used to represent ZERO-WIDTH NON-BREAKING SPACE.
Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets. The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.
...
---------- end-of-excerpt ---------------
Problem description
===================
Method org.w3c.dom.ls.DOMParser.parse(DOMInput src) crashes with java.lang.InternalError when trying to parse UTF16-encoded input source. The API documentation contains no any limitations for base encodings (supported by java platform), used to encode input source, so this behavior seems to be abnormal.
Minimized test:
===============
------- Test.java -------
import java.io.*;
import org.w3c.dom.ls.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
public class Test {
public static void main(String[] argv) {
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><encodingXML/>";
Document doc = null;
try {
DocumentBuilder parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
doc = parser.parse(new StringBufferInputStream(xml));
} catch (Throwable e) {
e.printStackTrace();
}
DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS","3.0");
DOMInput src = implLS.createDOMInput();
try {
src.setByteStream(new ByteArrayInputStream(xml.getBytes("UTF-16")));
} catch(UnsupportedEncodingException e) {
e.printStackTrace();
}
src.setEncoding("UTF-16");
DOMParser dp = implLS.createDOMParser(DOMImplementationLS.MODE_SYNCHRONOUS,"http://www.w3.org/2001/XMLSchema");
Document parsedXML = dp.parse(src);
}
}
------- end-of-Test.java -------
Minimized test output:
======================
<pav@hammer(pts/5).265> java Test
Exception in thread "main" java.lang.InternalError: Converter malfunction (Unicode) -- please submit a bug report via http://java.sun.com/cgi-bin/bugreport.cgi
at sun.nio.cs.StreamDecoder$ConverterSD.malfunction(StreamDecoder.java:235)
at sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:251)
at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:297)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:182)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1643)
at com.sun.org.apache.xerces.internal.impl.XML11EntityScanner.skipString(XML11EntityScanner.java:1018)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:188)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:188)
at com.sun.org.apache.xerces.internal.parsers.DTDConfiguration.parse(DTDConfiguration.java:593)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
at com.sun.org.apache.xerces.internal.parsers.DOMParserImpl.parse(DOMParserImpl.java:755)
at Test.main(Test.java:26)
<pav@hammer(pts/5).266>
JCK test source location:
==========================
/java/re/jck/1.5/promoted/latest/JCK-runtime-15/tests
Specific Machine Info:
=====================
Linux hammer 2.4.21 #1 Wed Jun 25 20:18:22 MSD 2003 i686 unknown
======================================================================
###@###.### 2003-10-04
###@###.### 2003-11-24
- backported by
-
JDK-2075224 DOMParser.parse() cannot parse UTF-16 encoded source
-
- Closed
-
- duplicates
-
JDK-4937150 JCK1.5-runtime api/org_w3c/dom/ls/DOMInput/index.html#order fails
-
- Closed
-