Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8343885

Parsing of XML documents containing CDATA section broken for XML1.1

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: P4 P4
    • None
    • 11
    • xml

      ADDITIONAL SYSTEM INFORMATION :
      Window/ C:\Program Files\Eclipse Adoptium\jdk-11.0.25.9-hotspot

      A DESCRIPTION OF THE PROBLEM :
      What I think the problem is that the XML 1.1 parser is broken. The 1.0 parse is correct.
      Code Discrepancy:

      Describing the difference in the conditional checks between XML 1.0 and XML 1.1 parsers:
      XML 1.1: if (fCurrentEntity.position >= fCurrentEntity.count - delimLen)
      XML 1.0: if (fCurrentEntity.position > fCurrentEntity.count - delimLen)
      Note that this discrepancy in >= vs. > appears to be the cause of the XML 1.1 parser error, as it misinterprets the end of CDATA sections.
      Basically, it looks like it fails if ] or ]] is at the end of the CDATA section.

      Looks similar to https://bugs.openjdk.org/browse/JDK-6318792?jql=text%20~%20%22cdata%20same%20entity%22, But there they fixed it for XML 1.0

      This is still broken, by the looks of it, in Xerces: https://github.com/apache/xerces-j/blame/ffa4f1072dfa947a216467384d4b90cd60b00b42/src/org/apache/xerces/impl/XML11EntityScanner.java#L1059



      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      If you try to parse following XML it fails.

      <?xml version='1.1' encoding='utf-8'?>
      <MYWorkXML>
        <Comment />
        <Description />
        <Name>TestXMLMU</Name>
        <Script>
          <Source><![CDATA[exec [QATestProc]]]></Source>
        </Script>
      </MYWorkXML>

      Error Msg : XML document structures must start and end within the same entity.


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      XML 1.0 parser handles this correctly, and ideally, the XML 1.1 parser should function similarly without failing on these CDATA patterns.
      ACTUAL -
      When using the XML 1.1 parser, it fails to correctly process CDATA sections that end with the characters ] or ]]. This results in an error or unexpected behavior, where the parser does not properly recognize the end of the CDATA section and fails to parse the XML document as expected.

      ---------- BEGIN SOURCE ----------
      import java.io.StringReader;
      import javax.xml.parsers.DocumentBuilder;
      import javax.xml.parsers.DocumentBuilderFactory;
      import org.xml.sax.InputSource;

      public class CDataTest {
          
          private static final String TEST = "<?xml version=\"1.0\"?>\n"+
                  "<value>\n"+
                  "<![CDATA[//IfStatement/Statement/Block[count(*) = 0]]]>\n"+
                  "</value>\n";
          
          /** Creates a new instance of CDataTest */
          public CDataTest() {
          }
          
          public static void main(String [] args) {
              System.out.println("Java version "+System.getProperty("java.version"));
              DocumentBuilder db;
              try {
                  db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
                  db.parse(new InputSource(new StringReader(TEST)));
                  System.out.println("parsed.");
              } catch (Exception ex) {
                  ex.printStackTrace();
              }
          }
          
      }
      The problem is the end of CDATA section. If the last character is not left angle bracket it passes but ']]]>' sequence is not recognized as a bracket and end of CDATA.
      ---------- END SOURCE ----------

            tongwan Andrew Wang
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: