Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: P4
Fix Version/s: None
Affects Version/s: 7u55, 8u25, 8u40
Component/s: xml
Labels:
- webbug

Subcomponent:
org.xml.sax
CPU:

x86_64
OS:

windows_7

FULL PRODUCT VERSION :
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b14)
Java HotSpot(TM) Client VM (build 24.55-b03, mixed mode, sharing)

A DESCRIPTION OF THE PROBLEM :
Attribute values with surrogate characters are being corrupted.
Having the following input file "db.xml" and trying to read it and writte it back to db_out.xml using the xerces API, in the output file the first surrogate character appears twice.

db.xml content:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈�𨦇�" />
</JDF>

db_out.xml content
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈𨦈𨦇"/>
</JDF>

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create a file "bd.xml" and fill it with the xml code specified in "Description" section
2. Run the piece of code from "Source code for an executable test case:" section
3. Verify the output file "db_out.xml" and observe that the first surrogate character appears twice.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈𨦇"/>
</JDF>
ACTUAL -
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<JDF>
<Command AcknowledgeURL="𨦈𨦈𨦇"/>
</JDF>

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
FileOutputStream fos = null;
FileInputStream fis = null;
BufferedWriter writer = null;
try {
fis = new FileInputStream("db.xml");
InputSource in = new InputSource(fis);
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document domDocument = builder.parse(in);

StringWriter stringOut = new StringWriter();
try {
TransformerFactory transfac = TransformerFactory.newInstance();
Transformer trans = transfac.newTransformer();
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
trans.setOutputProperty(OutputKeys.STANDALONE, "yes");
trans.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
trans.setOutputProperty(OutputKeys.INDENT, "yes");

StreamResult result = new StreamResult(stringOut);
DOMSource source = new DOMSource(domDocument);
trans.transform(source, result);

} catch (TransformerException e) {
e.printStackTrace();
}

String str = stringOut.toString();

fos = new FileOutputStream("db_out.xml");
writer = new BufferedWriter(new OutputStreamWriter(fos, "UTF-8"));
writer.write(str, 0, str.length());
writer.flush();

} catch (ParserConfigurationException | SAXException | IOException e) {
e.printStackTrace();
} finally {
if (null != fis) {
fis.close();
}

if (null != fos) {
fos.close();
}

if (null != writer) {
writer.close();
}
}
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
I don't have any workarround, but, as I investigate, I think the problem might be in method XMLScanner.scanAttributeValue() in
loop (do-while) --------------> test else if (c != -1 && XMLChar.isHighSurrogate(c)) ---------> fStringBuffer3 is not cleared.

duplicates

JDK-8058175 [XML 1.0/1.1] - Attribute values with supplemental characters are being corrupted.

Resolved

Assignee:: Joe Wang

Reporter:: Webbug Group

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2014-10-23 06:54

Updated:: 2014-11-10 08:47

Resolved:: 2014-11-10 08:47

Details

Description

Attachments

Issue Links

Activity

People

Dates