Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6526288

xerces parser encoding error

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: P4 P4
    • None
    • 6
    • xml

      FULL PRODUCT VERSION :
      java version "1.6.0_01"
      Java(TM) SE Runtime Environment (build 1.6.0_01-b04)
      Java HotSpot(TM) Server VM (build 1.6.0_01-b04, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      Version 5.1.2600

      A DESCRIPTION OF THE PROBLEM :
      The xerces parser included in the JDK handles entity references incorrectly, if they include literal unicode characters with a codepoint value larger than 65536.

      This bug surfaced when using the saxon XSLT transformer, therefore the bug has first been filed as a bug against saxon. However, the developer, Michael Kay reported that the bug lies in the underlying XML parser of the JDK, not in saxon. More information is available here:
      http://sourceforge.net/forum/forum.php?thread_id=1670443&forum_id=94027
      and
      http://sourceforge.net/tracker/index.php?func=detail&aid=1660205&group_id=29872&atid=397618


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      This uses saxon to produce the results:
      File test.xml:
      <?xml version="1.0" encoding="utf-8"?>
      <!DOCTYPE div [
      <!ELEMENT div ANY >
      <!ENTITY test1 '?'>
      <!ENTITY test2 '&#x2643D;'>
      ]>
      <div>
          ??&test1;&test2;&#x85F4;?'?'
      </div>


      ----
      This is the file used for transformation:
      test.xsl
      <?xml version="1.0" encoding="utf-8"?>
      <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
          <xsl:output indent="yes" method="xml"
              omit-xml-declaration="no"/>
          
          <xsl:template match="*">
              <xsl:copy>
                  <xsl:apply-templates select="@*"/>
                  <xsl:apply-templates select="*|processing-instruction()|comment()|text()"/>
              </xsl:copy>
          </xsl:template>
      </xsl:stylesheet>

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      <?xml version="1.0" encoding="UTF-8"?>
      <div>
          ??????'?'
      </div>

      ACTUAL -
      <?xml version="1.0" encoding="UTF-8"?>
      <div>
          ?????'?'
      </div>


      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      There is no error message, just a silent failing.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      CUSTOMER SUBMITTED WORKAROUND :
      A workaround seems to be to use numeric character entities, in this case '&#x2643D;'?as in entity test2 above.

            spericas Santiago Pericasgeertsen
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: