-
Bug
-
Resolution: Not an Issue
-
P4
-
None
-
1.4.2
-
x86
-
linux
Name: rmT116609 Date: 05/20/2003
FULL PRODUCT VERSION :
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)
FULL OS VERSION :
Linux stallion.elharo.com 2.4.18-6mdk #1 Fri Mar 15 02:59:08 CET 2002 i686 unknown
A DESCRIPTION OF THE PROBLEM :
When InputStreamReader is using the Cp037 (EBCDIC US) encoding and reads a NEL (Unicode 0x85 and EBCDIC 0x15) it converts it into a linefeed (\n). When OutputStreamWriter writes a linefeed in the Cp037, it instead writes a NEL.
NEL and linefeed are *not* the same character. Cp037 has separate, distinct code points for linefeed and NEL. It is important for XML parsing, among other uses, that they not be confused. The linefeed character qualifies for white space in XML. NEL does not. Several XML parsers have serious errors as a result of depending on Java to convert EBCDIC to Unicode.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run attached program
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
This code should output and read in all three common line end chars: NEL, linefeed, and carriage return. In both cases only two are seen. On output all linefeeds are changed to NELs. On input all NELs are changed to linefeeds.
ACTUAL -
Testing input stream
10
10
10
10
10
10
10
10
13
13
13
13
Testing output stream
0x15
0x15
0x15
0x15
0xD
0xD
0xD
0xD
0x15
0x15
0x15
0x15
0xD
0x15
0xD
0x15
0xD
0x15
0xD
0x15
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.*;
public class NELTest {
public static void main(String[] args) throws Exception {
System.out.println("Testing input stream");
byte[] data = {(byte) 0x15, (byte) 0x15, (byte) 0x15, (byte) 0x15, (byte) 0x25, (byte) 0x25, (byte) 0x25, (byte) 0x25, (byte) 13, (byte) 13, (byte) 13, (byte) 13};
ByteArrayInputStream in = new ByteArrayInputStream(data);
InputStreamReader reader = new InputStreamReader(in, "Cp037");
int c;
while ((c = reader.read()) != -1) {
System.out.println(c);
}
System.out.println("Testing output stream");
ByteArrayOutputStream out = new ByteArrayOutputStream();
OutputStreamWriter writer = new OutputStreamWriter(out, "Cp037");
writer.write((char) 0x85);
writer.write((char) 0x85);
writer.write((char) 0x85);
writer.write((char) 0x85);
writer.write((char) 13);
writer.write((char) 13);
writer.write((char) 13);
writer.write((char) 13);
writer.write((char) 10);
writer.write((char) 10);
writer.write((char) 10);
writer.write((char) 10);
writer.write((char) 13);
writer.write((char) 10);
writer.write((char) 13);
writer.write((char) 10);
writer.write((char) 13);
writer.write((char) 10);
writer.write((char) 13);
writer.write((char) 10);
writer.flush();
writer.close();
byte[] result = out.toByteArray();
for (int i = 0; i < result.length; i++) {
System.out.println("0x" + Integer.toHexString(result[i]).toUpperCase());
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
I've written my own special purpose EBCDIC writer that correctly converts NELs to linefeeds. For input I don't yet have a workaround, since the bug tends to manifest itself fairly deeply inside XML parsers.
(Review ID: 185599)
======================================================================
FULL PRODUCT VERSION :
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)
FULL OS VERSION :
Linux stallion.elharo.com 2.4.18-6mdk #1 Fri Mar 15 02:59:08 CET 2002 i686 unknown
A DESCRIPTION OF THE PROBLEM :
When InputStreamReader is using the Cp037 (EBCDIC US) encoding and reads a NEL (Unicode 0x85 and EBCDIC 0x15) it converts it into a linefeed (\n). When OutputStreamWriter writes a linefeed in the Cp037, it instead writes a NEL.
NEL and linefeed are *not* the same character. Cp037 has separate, distinct code points for linefeed and NEL. It is important for XML parsing, among other uses, that they not be confused. The linefeed character qualifies for white space in XML. NEL does not. Several XML parsers have serious errors as a result of depending on Java to convert EBCDIC to Unicode.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run attached program
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
This code should output and read in all three common line end chars: NEL, linefeed, and carriage return. In both cases only two are seen. On output all linefeeds are changed to NELs. On input all NELs are changed to linefeeds.
ACTUAL -
Testing input stream
10
10
10
10
10
10
10
10
13
13
13
13
Testing output stream
0x15
0x15
0x15
0x15
0xD
0xD
0xD
0xD
0x15
0x15
0x15
0x15
0xD
0x15
0xD
0x15
0xD
0x15
0xD
0x15
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.*;
public class NELTest {
public static void main(String[] args) throws Exception {
System.out.println("Testing input stream");
byte[] data = {(byte) 0x15, (byte) 0x15, (byte) 0x15, (byte) 0x15, (byte) 0x25, (byte) 0x25, (byte) 0x25, (byte) 0x25, (byte) 13, (byte) 13, (byte) 13, (byte) 13};
ByteArrayInputStream in = new ByteArrayInputStream(data);
InputStreamReader reader = new InputStreamReader(in, "Cp037");
int c;
while ((c = reader.read()) != -1) {
System.out.println(c);
}
System.out.println("Testing output stream");
ByteArrayOutputStream out = new ByteArrayOutputStream();
OutputStreamWriter writer = new OutputStreamWriter(out, "Cp037");
writer.write((char) 0x85);
writer.write((char) 0x85);
writer.write((char) 0x85);
writer.write((char) 0x85);
writer.write((char) 13);
writer.write((char) 13);
writer.write((char) 13);
writer.write((char) 13);
writer.write((char) 10);
writer.write((char) 10);
writer.write((char) 10);
writer.write((char) 10);
writer.write((char) 13);
writer.write((char) 10);
writer.write((char) 13);
writer.write((char) 10);
writer.write((char) 13);
writer.write((char) 10);
writer.write((char) 13);
writer.write((char) 10);
writer.flush();
writer.close();
byte[] result = out.toByteArray();
for (int i = 0; i < result.length; i++) {
System.out.println("0x" + Integer.toHexString(result[i]).toUpperCase());
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
I've written my own special purpose EBCDIC writer that correctly converts NELs to linefeeds. For input I don't yet have a workaround, since the bug tends to manifest itself fairly deeply inside XML parsers.
(Review ID: 185599)
======================================================================
- relates to
-
JDK-7016785 Line-Feed swapping functionality for EBCDIC converters
- Closed