-
Bug
-
Resolution: Duplicate
-
P3
-
None
-
7, 8u25, 8u66
-
generic
-
generic
FULL PRODUCT VERSION :
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b18)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [version 6.3.9600]
A DESCRIPTION OF THE PROBLEM :
This is an upstream bug for:
https://josm.openstreetmap.de/ticket/3290
The attached small XML file contains a chinese character and the first gothic character (U+10330 : http://www.unicode.org/charts/PDF/U10330.pdf)
When parsing this file using StaX, the attribute value containing the gothic character is corrupted: it contains also the chinese character from the previous attribute.
See the console output:
From XML chinese:[-16, -92, -83, -94]
Expected chinese:[-16, -92, -83, -94]
From XML gothic:[-16, -92, -83, -94, -16, -112, -116, -80]
Expected gothic:[-16, -112, -116, -80]
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run attached program
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No error
ACTUAL -
Characters are corrupted
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
public class Test {
public static void main(String[] args) {
Map<String, String> map = new HashMap<String, String>();
try {
InputStreamReader ir = new InputStreamReader(new FileInputStream("D:\\Users\\Vincent\\Desktop\\JOSM_work\\gottic.osm"), "UTF-8");
XMLStreamReader parser = XMLInputFactory.newInstance().createXMLStreamReader(ir);
int event = parser.getEventType();
while (true) {
if (event == XMLStreamConstants.START_ELEMENT) {
String key = parser.getAttributeValue(null, "k");
String value = parser.getAttributeValue(null, "v");
if (key != null && value != null) {
map.put(key.intern(), value.intern());
}
}
if (parser.hasNext()) {
event = parser.next();
} else {
break;
}
}
parser.close();
String value = map.get("name:ch");
System.out.println("From XML chinese:" + Arrays.toString(value.getBytes("UTF-8")));
value = new String(Character.toChars(0x24B62));
System.out.println("Expected chinese:" + Arrays.toString(value.getBytes("UTF-8")));
value = map.get("name:got");
System.out.println("From XML gothic:" + Arrays.toString(value.getBytes("UTF-8")));
value = new String(Character.toChars(0x10330));
System.out.println("Expected gothic:" + Arrays.toString(value.getBytes("UTF-8")));
} catch (Exception e) {
e.printStackTrace();
}
}
}
---------- END SOURCE ----------
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b18)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [version 6.3.9600]
A DESCRIPTION OF THE PROBLEM :
This is an upstream bug for:
https://josm.openstreetmap.de/ticket/3290
The attached small XML file contains a chinese character and the first gothic character (U+10330 : http://www.unicode.org/charts/PDF/U10330.pdf)
When parsing this file using StaX, the attribute value containing the gothic character is corrupted: it contains also the chinese character from the previous attribute.
See the console output:
From XML chinese:[-16, -92, -83, -94]
Expected chinese:[-16, -92, -83, -94]
From XML gothic:[-16, -92, -83, -94, -16, -112, -116, -80]
Expected gothic:[-16, -112, -116, -80]
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run attached program
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No error
ACTUAL -
Characters are corrupted
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
public class Test {
public static void main(String[] args) {
Map<String, String> map = new HashMap<String, String>();
try {
InputStreamReader ir = new InputStreamReader(new FileInputStream("D:\\Users\\Vincent\\Desktop\\JOSM_work\\gottic.osm"), "UTF-8");
XMLStreamReader parser = XMLInputFactory.newInstance().createXMLStreamReader(ir);
int event = parser.getEventType();
while (true) {
if (event == XMLStreamConstants.START_ELEMENT) {
String key = parser.getAttributeValue(null, "k");
String value = parser.getAttributeValue(null, "v");
if (key != null && value != null) {
map.put(key.intern(), value.intern());
}
}
if (parser.hasNext()) {
event = parser.next();
} else {
break;
}
}
parser.close();
String value = map.get("name:ch");
System.out.println("From XML chinese:" + Arrays.toString(value.getBytes("UTF-8")));
value = new String(Character.toChars(0x24B62));
System.out.println("Expected chinese:" + Arrays.toString(value.getBytes("UTF-8")));
value = map.get("name:got");
System.out.println("From XML gothic:" + Arrays.toString(value.getBytes("UTF-8")));
value = new String(Character.toChars(0x10330));
System.out.println("Expected gothic:" + Arrays.toString(value.getBytes("UTF-8")));
} catch (Exception e) {
e.printStackTrace();
}
}
}
---------- END SOURCE ----------
- duplicates
-
JDK-8058175 [XML 1.0/1.1] - Attribute values with supplemental characters are being corrupted.
- Resolved