-
Bug
-
Resolution: Duplicate
-
P4
-
None
-
7u55, 8u25, 8u40
-
x86_64
-
windows_7
FULL PRODUCT VERSION :
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b14)
Java HotSpot(TM) Client VM (build 24.55-b03, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
Attribute values with surrogate characters are being corrupted.
Having the following input file "db.xml" and trying to read it and writte it back to db_out.xml using the xerces API, in the output file the first surrogate character appears twice.
db.xml content:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈�𨦇�" />
</JDF>
db_out.xml content
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈𨦈𨦇"/>
</JDF>
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create a file "bd.xml" and fill it with the xml code specified in "Description" section
2. Run the piece of code from "Source code for an executable test case:" section
3. Verify the output file "db_out.xml" and observe that the first surrogate character appears twice.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈𨦇"/>
</JDF>
ACTUAL -
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈𨦈𨦇"/>
</JDF>
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
FileOutputStream fos = null;
FileInputStream fis = null;
BufferedWriter writer = null;
try {
fis = new FileInputStream("db.xml");
InputSource in = new InputSource(fis);
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document domDocument = builder.parse(in);
StringWriter stringOut = new StringWriter();
try {
TransformerFactory transfac = TransformerFactory.newInstance();
Transformer trans = transfac.newTransformer();
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
trans.setOutputProperty(OutputKeys.STANDALONE, "yes");
trans.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(stringOut);
DOMSource source = new DOMSource(domDocument);
trans.transform(source, result);
} catch (TransformerException e) {
e.printStackTrace();
}
String str = stringOut.toString();
fos = new FileOutputStream("db_out.xml");
writer = new BufferedWriter(new OutputStreamWriter(fos, "UTF-8"));
writer.write(str, 0, str.length());
writer.flush();
} catch (ParserConfigurationException | SAXException | IOException e) {
e.printStackTrace();
} finally {
if (null != fis) {
fis.close();
}
if (null != fos) {
fos.close();
}
if (null != writer) {
writer.close();
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
I don't have any workarround, but, as I investigate, I think the problem might be in method XMLScanner.scanAttributeValue() in
loop (do-while) --------------> test else if (c != -1 && XMLChar.isHighSurrogate(c)) ---------> fStringBuffer3 is not cleared.
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b14)
Java HotSpot(TM) Client VM (build 24.55-b03, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
Attribute values with surrogate characters are being corrupted.
Having the following input file "db.xml" and trying to read it and writte it back to db_out.xml using the xerces API, in the output file the first surrogate character appears twice.
db.xml content:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈�𨦇�" />
</JDF>
db_out.xml content
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈𨦈𨦇"/>
</JDF>
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create a file "bd.xml" and fill it with the xml code specified in "Description" section
2. Run the piece of code from "Source code for an executable test case:" section
3. Verify the output file "db_out.xml" and observe that the first surrogate character appears twice.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈𨦇"/>
</JDF>
ACTUAL -
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈𨦈𨦇"/>
</JDF>
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
FileOutputStream fos = null;
FileInputStream fis = null;
BufferedWriter writer = null;
try {
fis = new FileInputStream("db.xml");
InputSource in = new InputSource(fis);
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document domDocument = builder.parse(in);
StringWriter stringOut = new StringWriter();
try {
TransformerFactory transfac = TransformerFactory.newInstance();
Transformer trans = transfac.newTransformer();
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
trans.setOutputProperty(OutputKeys.STANDALONE, "yes");
trans.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(stringOut);
DOMSource source = new DOMSource(domDocument);
trans.transform(source, result);
} catch (TransformerException e) {
e.printStackTrace();
}
String str = stringOut.toString();
fos = new FileOutputStream("db_out.xml");
writer = new BufferedWriter(new OutputStreamWriter(fos, "UTF-8"));
writer.write(str, 0, str.length());
writer.flush();
} catch (ParserConfigurationException | SAXException | IOException e) {
e.printStackTrace();
} finally {
if (null != fis) {
fis.close();
}
if (null != fos) {
fos.close();
}
if (null != writer) {
writer.close();
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
I don't have any workarround, but, as I investigate, I think the problem might be in method XMLScanner.scanAttributeValue() in
loop (do-while) --------------> test else if (c != -1 && XMLChar.isHighSurrogate(c)) ---------> fStringBuffer3 is not cleared.
- duplicates
-
JDK-8058175 [XML 1.0/1.1] - Attribute values with supplemental characters are being corrupted.
- Resolved