-
Bug
-
Resolution: Cannot Reproduce
-
P4
-
None
-
11
-
generic
-
generic
ADDITIONAL SYSTEM INFORMATION :
Window/ C:\Program Files\Eclipse Adoptium\jdk-11.0.25.9-hotspot
A DESCRIPTION OF THE PROBLEM :
What I think the problem is that the XML 1.1 parser is broken. The 1.0 parse is correct.
Code Discrepancy:
Describing the difference in the conditional checks between XML 1.0 and XML 1.1 parsers:
XML 1.1: if (fCurrentEntity.position >= fCurrentEntity.count - delimLen)
XML 1.0: if (fCurrentEntity.position > fCurrentEntity.count - delimLen)
Note that this discrepancy in >= vs. > appears to be the cause of the XML 1.1 parser error, as it misinterprets the end of CDATA sections.
Basically, it looks like it fails if ] or ]] is at the end of the CDATA section.
Looks similar to https://bugs.openjdk.org/browse/JDK-6318792?jql=text%20~%20%22cdata%20same%20entity%22, But there they fixed it for XML 1.0
This is still broken, by the looks of it, in Xerces: https://github.com/apache/xerces-j/blame/ffa4f1072dfa947a216467384d4b90cd60b00b42/src/org/apache/xerces/impl/XML11EntityScanner.java#L1059
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
If you try to parse following XML it fails.
<?xml version='1.1' encoding='utf-8'?>
<MYWorkXML>
<Comment />
<Description />
<Name>TestXMLMU</Name>
<Script>
<Source><![CDATA[exec [QATestProc]]]></Source>
</Script>
</MYWorkXML>
Error Msg : XML document structures must start and end within the same entity.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
XML 1.0 parser handles this correctly, and ideally, the XML 1.1 parser should function similarly without failing on these CDATA patterns.
ACTUAL -
When using the XML 1.1 parser, it fails to correctly process CDATA sections that end with the characters ] or ]]. This results in an error or unexpected behavior, where the parser does not properly recognize the end of the CDATA section and fails to parse the XML document as expected.
---------- BEGIN SOURCE ----------
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.xml.sax.InputSource;
public class CDataTest {
private static final String TEST = "<?xml version=\"1.0\"?>\n"+
"<value>\n"+
"<![CDATA[//IfStatement/Statement/Block[count(*) = 0]]]>\n"+
"</value>\n";
/** Creates a new instance of CDataTest */
public CDataTest() {
}
public static void main(String [] args) {
System.out.println("Java version "+System.getProperty("java.version"));
DocumentBuilder db;
try {
db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
db.parse(new InputSource(new StringReader(TEST)));
System.out.println("parsed.");
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
The problem is the end of CDATA section. If the last character is not left angle bracket it passes but ']]]>' sequence is not recognized as a bracket and end of CDATA.
---------- END SOURCE ----------
Window/ C:\Program Files\Eclipse Adoptium\jdk-11.0.25.9-hotspot
A DESCRIPTION OF THE PROBLEM :
What I think the problem is that the XML 1.1 parser is broken. The 1.0 parse is correct.
Code Discrepancy:
Describing the difference in the conditional checks between XML 1.0 and XML 1.1 parsers:
XML 1.1: if (fCurrentEntity.position >= fCurrentEntity.count - delimLen)
XML 1.0: if (fCurrentEntity.position > fCurrentEntity.count - delimLen)
Note that this discrepancy in >= vs. > appears to be the cause of the XML 1.1 parser error, as it misinterprets the end of CDATA sections.
Basically, it looks like it fails if ] or ]] is at the end of the CDATA section.
Looks similar to https://bugs.openjdk.org/browse/JDK-6318792?jql=text%20~%20%22cdata%20same%20entity%22, But there they fixed it for XML 1.0
This is still broken, by the looks of it, in Xerces: https://github.com/apache/xerces-j/blame/ffa4f1072dfa947a216467384d4b90cd60b00b42/src/org/apache/xerces/impl/XML11EntityScanner.java#L1059
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
If you try to parse following XML it fails.
<?xml version='1.1' encoding='utf-8'?>
<MYWorkXML>
<Comment />
<Description />
<Name>TestXMLMU</Name>
<Script>
<Source><![CDATA[exec [QATestProc]]]></Source>
</Script>
</MYWorkXML>
Error Msg : XML document structures must start and end within the same entity.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
XML 1.0 parser handles this correctly, and ideally, the XML 1.1 parser should function similarly without failing on these CDATA patterns.
ACTUAL -
When using the XML 1.1 parser, it fails to correctly process CDATA sections that end with the characters ] or ]]. This results in an error or unexpected behavior, where the parser does not properly recognize the end of the CDATA section and fails to parse the XML document as expected.
---------- BEGIN SOURCE ----------
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.xml.sax.InputSource;
public class CDataTest {
private static final String TEST = "<?xml version=\"1.0\"?>\n"+
"<value>\n"+
"<![CDATA[//IfStatement/Statement/Block[count(*) = 0]]]>\n"+
"</value>\n";
/** Creates a new instance of CDataTest */
public CDataTest() {
}
public static void main(String [] args) {
System.out.println("Java version "+System.getProperty("java.version"));
DocumentBuilder db;
try {
db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
db.parse(new InputSource(new StringReader(TEST)));
System.out.println("parsed.");
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
The problem is the end of CDATA section. If the last character is not left angle bracket it passes but ']]]>' sequence is not recognized as a bracket and end of CDATA.
---------- END SOURCE ----------