-
Bug
-
Resolution: Won't Fix
-
P4
-
None
-
1.1.8
-
generic
-
generic
Name: boT120536 Date: 09/05/2001
java version "1.1.8"
java version "1.3.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1-b24)
Java HotSpot(TM) Client VM (build 1.3.1-b24, mixed mode)
Any attempt to convert a decomposed Unicode sequence into another encoding
fails. For example, the sequence:
"u" "combining diaresis" "o" "combining diaresis" "a" "combining diaresis"
produces the characters
"u" "?" "o" "?" "a" "?"
when converted to ISO-8859-1 (Latin 1).
This problem is present in:
Sun JDK 1.1.8, 1.2.2, 1.3.0, 1.3.1 on Windows
IBM JRE 1.1.8, 1.3 on Windows
Sun JRE 1.1.7b on Linux
Sun JDK 1.2.2, 1.3.1 on Linux
Apple MRJ 2.2.5 on MacOS 9
Apple MRJ 3 (1.3) on MacOS X
Sun JRE 1.3.1 on Solaris 8 (SPARC)
Example code:
import java.io.*;
public class DecomposedEncoding {
private static final String decomposed = "u\u0308o\u0308a\u0308";
private static final String composed = "\u00fc\u00f6\u00e4";
public static final void main(String[] args) throws Throwable {
byte[] latin1Decomposed;
byte[] latin1Composed;
try {
latin1Decomposed = decomposed.getBytes("8859_1");
latin1Composed = composed.getBytes("8859_1");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
System.exit(-1);
return;
}
for (int i = 0; i < latin1Decomposed.length; i++) {
System.out.println("Decomp["+i+"] =\t"+Integer.toHexString (latin1Decomposed[i]));
}
for (int i = 0; i < latin1Composed.length; i++) {
System.out.println("Compos["+i+"] =\t"+Integer.toHexString (latin1Composed[i]));
}
}
}
Output:
Decomp[0] = 75
Decomp[1] = 3f
Decomp[2] = 6f
Decomp[3] = 3f
Decomp[4] = 61
Decomp[5] = 3f
Compos[0] = fffffffc
Compos[1] = fffffff6
Compos[2] = ffffffe4
Note that 0x3f is the Latin 1 (and ASCII) code for '?'.
(Review ID: 131406)
======================================================================
java version "1.1.8"
java version "1.3.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1-b24)
Java HotSpot(TM) Client VM (build 1.3.1-b24, mixed mode)
Any attempt to convert a decomposed Unicode sequence into another encoding
fails. For example, the sequence:
"u" "combining diaresis" "o" "combining diaresis" "a" "combining diaresis"
produces the characters
"u" "?" "o" "?" "a" "?"
when converted to ISO-8859-1 (Latin 1).
This problem is present in:
Sun JDK 1.1.8, 1.2.2, 1.3.0, 1.3.1 on Windows
IBM JRE 1.1.8, 1.3 on Windows
Sun JRE 1.1.7b on Linux
Sun JDK 1.2.2, 1.3.1 on Linux
Apple MRJ 2.2.5 on MacOS 9
Apple MRJ 3 (1.3) on MacOS X
Sun JRE 1.3.1 on Solaris 8 (SPARC)
Example code:
import java.io.*;
public class DecomposedEncoding {
private static final String decomposed = "u\u0308o\u0308a\u0308";
private static final String composed = "\u00fc\u00f6\u00e4";
public static final void main(String[] args) throws Throwable {
byte[] latin1Decomposed;
byte[] latin1Composed;
try {
latin1Decomposed = decomposed.getBytes("8859_1");
latin1Composed = composed.getBytes("8859_1");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
System.exit(-1);
return;
}
for (int i = 0; i < latin1Decomposed.length; i++) {
System.out.println("Decomp["+i+"] =\t"+Integer.toHexString (latin1Decomposed[i]));
}
for (int i = 0; i < latin1Composed.length; i++) {
System.out.println("Compos["+i+"] =\t"+Integer.toHexString (latin1Composed[i]));
}
}
}
Output:
Decomp[0] = 75
Decomp[1] = 3f
Decomp[2] = 6f
Decomp[3] = 3f
Decomp[4] = 61
Decomp[5] = 3f
Compos[0] = fffffffc
Compos[1] = fffffff6
Compos[2] = ffffffe4
Note that 0x3f is the Latin 1 (and ASCII) code for '?'.
(Review ID: 131406)
======================================================================