-
Bug
-
Resolution: Cannot Reproduce
-
P3
-
None
-
6
-
x86
-
windows_xp
FULL PRODUCT VERSION :
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Windows XP
A DESCRIPTION OF THE PROBLEM :
If a document contains a traditional chinese (4-bytes UTF-8 character) after a numeric character reference, the resulting DOM has garbage characters.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the test case.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
All tests should be successful.
ACTUAL -
testCharRefAndRawChineseChar() fails.
The characters of the the numeric reference itself are inserted before the unescaped chinese character( "80" in the test case).
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import junit.framework.TestCase;
import org.w3c.dom.Document;
public class XMLChineseTest extends TestCase {
static final String CHINESE_STR = new String(Character.toChars(65766));
public void testRawChineseChar() throws Exception {
checkXMLParsing(CHINESE_STR, CHINESE_STR);
}
public void testCharRefAndEscapedChineseChar() throws Exception {
checkXMLParsing("P𐃦", (char)(80) + CHINESE_STR);
}
public void testCharRefAndRawChineseChar() throws Exception {
checkXMLParsing("P" + CHINESE_STR, (char)(80) + CHINESE_STR);
}
private void checkXMLParsing(String encodedValue, String expectedDOMValue) throws Exception {
String xml = "<truc value=\"" + encodedValue + "\" />";
System.out.println("xml input: " + xml);
byte[] xmlBytes = xml.getBytes("UTF-8");
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new ByteArrayInputStream(xmlBytes));
String readValue = doc.getDocumentElement().getAttribute("value");
System.out.println("Read value: " + readValue);
assertEquals(expectedDOMValue, readValue);
}
}
Release Regression From : 5.0u12
The above release value was the last known release where this
bug was not reproducible. Since then there has been a regression.
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Windows XP
A DESCRIPTION OF THE PROBLEM :
If a document contains a traditional chinese (4-bytes UTF-8 character) after a numeric character reference, the resulting DOM has garbage characters.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the test case.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
All tests should be successful.
ACTUAL -
testCharRefAndRawChineseChar() fails.
The characters of the the numeric reference itself are inserted before the unescaped chinese character( "80" in the test case).
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import junit.framework.TestCase;
import org.w3c.dom.Document;
public class XMLChineseTest extends TestCase {
static final String CHINESE_STR = new String(Character.toChars(65766));
public void testRawChineseChar() throws Exception {
checkXMLParsing(CHINESE_STR, CHINESE_STR);
}
public void testCharRefAndEscapedChineseChar() throws Exception {
checkXMLParsing("P𐃦", (char)(80) + CHINESE_STR);
}
public void testCharRefAndRawChineseChar() throws Exception {
checkXMLParsing("P" + CHINESE_STR, (char)(80) + CHINESE_STR);
}
private void checkXMLParsing(String encodedValue, String expectedDOMValue) throws Exception {
String xml = "<truc value=\"" + encodedValue + "\" />";
System.out.println("xml input: " + xml);
byte[] xmlBytes = xml.getBytes("UTF-8");
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new ByteArrayInputStream(xmlBytes));
String readValue = doc.getDocumentElement().getAttribute("value");
System.out.println("Read value: " + readValue);
assertEquals(expectedDOMValue, readValue);
}
}
Release Regression From : 5.0u12
The above release value was the last known release where this
bug was not reproducible. Since then there has been a regression.