Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4499926

Unicode decomposed forms are not correctly converted to other encodings.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: P4 P4
    • None
    • 1.1.8
    • core-libs

      Name: boT120536 Date: 09/05/2001


      java version "1.1.8"

      java version "1.3.1"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1-b24)
      Java HotSpot(TM) Client VM (build 1.3.1-b24, mixed mode)

      Any attempt to convert a decomposed Unicode sequence into another encoding
      fails. For example, the sequence:

      "u" "combining diaresis" "o" "combining diaresis" "a" "combining diaresis"

      produces the characters

      "u" "?" "o" "?" "a" "?"

      when converted to ISO-8859-1 (Latin 1).


      This problem is present in:
      Sun JDK 1.1.8, 1.2.2, 1.3.0, 1.3.1 on Windows
      IBM JRE 1.1.8, 1.3 on Windows
      Sun JRE 1.1.7b on Linux
      Sun JDK 1.2.2, 1.3.1 on Linux
      Apple MRJ 2.2.5 on MacOS 9
      Apple MRJ 3 (1.3) on MacOS X
      Sun JRE 1.3.1 on Solaris 8 (SPARC)

      Example code:

      import java.io.*;

      public class DecomposedEncoding {
          private static final String decomposed = "u\u0308o\u0308a\u0308";
          private static final String composed = "\u00fc\u00f6\u00e4";

          public static final void main(String[] args) throws Throwable {
              byte[] latin1Decomposed;
              byte[] latin1Composed;
              try {
                  latin1Decomposed = decomposed.getBytes("8859_1");
                  latin1Composed = composed.getBytes("8859_1");
              } catch (UnsupportedEncodingException e) {
                  e.printStackTrace();
                  System.exit(-1);
                  return;
              }

              for (int i = 0; i < latin1Decomposed.length; i++) {
                  System.out.println("Decomp["+i+"] =\t"+Integer.toHexString (latin1Decomposed[i]));
              }
              for (int i = 0; i < latin1Composed.length; i++) {
                  System.out.println("Compos["+i+"] =\t"+Integer.toHexString (latin1Composed[i]));
              }
          }
      }


      Output:

      Decomp[0] = 75
      Decomp[1] = 3f
      Decomp[2] = 6f
      Decomp[3] = 3f
      Decomp[4] = 61
      Decomp[5] = 3f
      Compos[0] = fffffffc
      Compos[1] = fffffff6
      Compos[2] = ffffffe4

      Note that 0x3f is the Latin 1 (and ASCII) code for '?'.
      (Review ID: 131406)
      ======================================================================

            sherman Xueming Shen
            bonealsunw Bret O'neal (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: