Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4496203

String.getBytes("ISO-8859-1") doesn't work in Unicode environment w/ umlauts

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P4 P4
    • None
    • 1.4.0
    • core-libs

      Name: nt126004 Date: 08/24/2001


      java version "1.3.1"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1-b24)
      Java HotSpot(TM) Client VM (build 1.3.1-b24, mixed mode)

      We have created a document in Germany (ISO-8859-1 encoding) that contains German
      "umlautz" like ???????. On an NT box in Korea, this file is read into a string.
      Then, on the string a .getBytes("ISO-8859-1") is called. The resulting byte
      array contains ? for the German umlautz, and at it's end, it contains as many
      null-bytes at it contains umlautz in the body. The same code works well in
      Germany.

      Ok : Having "?" in the bytes.
        Bug: Having null bytes at the end. This should never happen in my opinion.


      This is the sample source:
      1) Contents of the script file to read:
      -----------------------------------------
      My script contains umlautz like ??? or ??? or ? and more.
      -----------------------------------------

      2) The test case:
      -----------------------------------------
      package unibug;
      import java.io.*;

      /**
       * Demonstrate a .getBytes bug in Unicode environments.
       * @author Peter Holzwarth
       * @version 1.0
       */

      public class UniBug {
          public static void main(String[] args) {
      try {
      String script= readFile("script.txt");
      byte[] buf= script.getBytes("ISO-8859-1");
      // doesn't change anything:
      // byte[] buf= script.getBytes();
      for (int i= 0; i<buf.length; i++) {
      System.out.println("byte ["+i+"] is "+buf[i]);
      }
      } catch (java.io.UnsupportedEncodingException uee) {
      // won't happen for "ISO-8859-1"
      } catch (java.io.IOException ioe) {
      // test file not there
      System.err.println("Couldn't read script: "+ioe);
      }
          }

          protected static String readFile(String filename) throws IOException {
      FileInputStream stream = new FileInputStream(filename);
      InputStreamReader isr = new InputStreamReader(stream);
      // partial fix, if we knew the locale of the source file:
      // InputStreamReader isr = new InputStreamReader(stream, "ISO-8859-1");
      char[] data = new char[stream.available()];
      isr.read(data);
      isr.close();
      stream.close();
      return new String(data);
          }
      }
      -----------------------------------------

      3) The log output in Korea:
      -----------------------------------------
      byte [0] is 77
      byte [1] is 121
      byte [2] is 32
      byte [3] is 115
      byte [4] is 99
      byte [5] is 114
      byte [6] is 105
      byte [7] is 112
      byte [8] is 116
      byte [9] is 32
      byte [10] is 99
      byte [11] is 111
      byte [12] is 110
      byte [13] is 116
      byte [14] is 97
      byte [15] is 105
      byte [16] is 110
      byte [17] is 115
      byte [18] is 32
      byte [19] is 117
      byte [20] is 109
      byte [21] is 108
      byte [22] is 97
      byte [23] is 117
      byte [24] is 116
      byte [25] is 122
      byte [26] is 32
      byte [27] is 108
      byte [28] is 105
      byte [29] is 107
      byte [30] is 101
      byte [31] is 32
      byte [32] is 63
      byte [33] is 63
      byte [34] is 111
      byte [35] is 114
      byte [36] is 32
      byte [37] is 63
      byte [38] is 63
      byte [39] is 111
      byte [40] is 114
      byte [41] is 32
      byte [42] is 63
      byte [43] is 97
      byte [44] is 110
      byte [45] is 100
      byte [46] is 32
      byte [47] is 109
      byte [48] is 111
      byte [49] is 114
      byte [50] is 101
      byte [51] is 46
      byte [52] is 13
      byte [53] is 10
      byte [54] is 13
      byte [55] is 10
      byte [56] is 0
      byte [57] is 0
      byte [58] is 0
      byte [59] is 0
      byte [60] is 0
      -----------------------------------------
      (Review ID: 130521)
      ======================================================================</TEXTAREA>
      </td>
                          </tr>
                          <TR>
                            <TD colspan="2" bgcolor="#BFBFBF"> </td>
                          </tr>

      <a name="comments"></a>
                          <!-- COMMENTS -->
                          <TR>
                            <TD bgcolor="#BFBFBF" align="left" valign="bottom" height="24">
      <img src="/bugz/images/dot.gif" width="10">Comments
      </td>
                            <TD bgcolor="#BFBFBF" align="left" valign="bottom" height="24">
      <!-- BEGIN:TBR Mohan
        <A href="javascript:doDateStampSubmit(document.editbug_general, 'comments');"><font size="-1">[ Date Stamp ]</font></A>
      <img src="/bugz/images/dot.gif" width="18">
      END:TBR -->
      <A href="javascript:doFullPageSubmit(document.editbug_general, 'comments');"><font size="-1">[ Full Page ]</font></a>
      <img src="/bugz/images/dot.gif" width="22">
      <FONT size="-1" color="darkblue">--- Enter SUN Proprietary data here ---</font>
      </td>
                          </tr>

                          <TR>
                            <TD bgcolor="#BFBFBF" colspan="2" nowrap align="left">
      <img src="/bugz/images/dot.gif" width="5">
                              <TEXTAREA rows="6" cols="95" wrap="virtual" name="comments" align="left" bgcolor="white">

      Name: nt126004 Date: 08/24/2001


      (company - iO Software GmbH , email - ###@###.###)

      ===============
      old Synopsis: String.getBytes("ISO-8859-1") doesn't work as expected in Unicode environment
      I couldn't reproduce this exactly here. When I ran this program I got a bunch
      of negative values for some of the bytes, but no null values at the end.

      Looks similar to bug 4179049, but this one has the null characters at the
      end.

      added script.zip.Z which contains the sample file for the bug.

      Tested against build 1.4.0-beta2-b77, so changed the release from 1.3.1 to 1.4b2
      ======================================================================
      ###@###.### 2001-08-24
      ###@###.### 2003-01-09

            ilittlesunw Ian Little (Inactive)
            nthompsosunw Nathanael Thompson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: