-
Bug
-
Resolution: Cannot Reproduce
-
P4
-
None
-
6
-
x86
-
windows_xp
FULL PRODUCT VERSION :
java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b04)
Java HotSpot(TM) Server VM (build 1.6.0_01-b04, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Version 5.1.2600
A DESCRIPTION OF THE PROBLEM :
The xerces parser included in the JDK handles entity references incorrectly, if they include literal unicode characters with a codepoint value larger than 65536.
This bug surfaced when using the saxon XSLT transformer, therefore the bug has first been filed as a bug against saxon. However, the developer, Michael Kay reported that the bug lies in the underlying XML parser of the JDK, not in saxon. More information is available here:
http://sourceforge.net/forum/forum.php?thread_id=1670443&forum_id=94027
and
http://sourceforge.net/tracker/index.php?func=detail&aid=1660205&group_id=29872&atid=397618
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
This uses saxon to produce the results:
File test.xml:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE div [
<!ELEMENT div ANY >
<!ENTITY test1 '?'>
<!ENTITY test2 '𦐽'>
]>
<div>
??&test1;&test2;藴?'?'
</div>
----
This is the file used for transformation:
test.xsl
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes" method="xml"
omit-xml-declaration="no"/>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="*|processing-instruction()|comment()|text()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
<?xml version="1.0" encoding="UTF-8"?>
<div>
??????'?'
</div>
ACTUAL -
<?xml version="1.0" encoding="UTF-8"?>
<div>
?????'?'
</div>
ERROR MESSAGES/STACK TRACES THAT OCCUR :
There is no error message, just a silent failing.
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
A workaround seems to be to use numeric character entities, in this case '𦐽'?as in entity test2 above.
java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b04)
Java HotSpot(TM) Server VM (build 1.6.0_01-b04, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Version 5.1.2600
A DESCRIPTION OF THE PROBLEM :
The xerces parser included in the JDK handles entity references incorrectly, if they include literal unicode characters with a codepoint value larger than 65536.
This bug surfaced when using the saxon XSLT transformer, therefore the bug has first been filed as a bug against saxon. However, the developer, Michael Kay reported that the bug lies in the underlying XML parser of the JDK, not in saxon. More information is available here:
http://sourceforge.net/forum/forum.php?thread_id=1670443&forum_id=94027
and
http://sourceforge.net/tracker/index.php?func=detail&aid=1660205&group_id=29872&atid=397618
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
This uses saxon to produce the results:
File test.xml:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE div [
<!ELEMENT div ANY >
<!ENTITY test1 '?'>
<!ENTITY test2 '𦐽'>
]>
<div>
??&test1;&test2;藴?'?'
</div>
----
This is the file used for transformation:
test.xsl
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes" method="xml"
omit-xml-declaration="no"/>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="*|processing-instruction()|comment()|text()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
<?xml version="1.0" encoding="UTF-8"?>
<div>
??????'?'
</div>
ACTUAL -
<?xml version="1.0" encoding="UTF-8"?>
<div>
?????'?'
</div>
ERROR MESSAGES/STACK TRACES THAT OCCUR :
There is no error message, just a silent failing.
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
A workaround seems to be to use numeric character entities, in this case '𦐽'?as in entity test2 above.