Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4214355

(spec) String constructor spec depends on encoding

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 1.1.6
    • core-libs
    • unknown
    • solaris_2.6

      String objects creatd by the following String constructor are different by the encoding schemes used in the byte data when the buffer contains extra data which cannot be converted.

      String(byte[] buffer, int offset, int count, String enc) will

      Attached program will produce the following result, where buffer contains extra 1 byte data in each encoding.

      length (via EUCJP) : 4
      length (via SJIS) : 0

      This shows that in EUCJP encoding, String() will ignore the extra byte and returns the String object created from the previous 8 byte data, whereas in
      SJIS encoding, it ignores not only the extra byte but the whole data in the buffer and returns an empty string.

      (The JDK1.1.7 and JDK1.2 spec does not exactly mention which result will be appropriate.)


      public class StringTest {
           public static void main(String args[]) throws Exception {
               String str = "\u3042\u3044\u3046\u3048\u304a"; // "AIUEO" in Japanese

               String eucjp = new String(str.getBytes("EUCJP"), 0, 9, "EUCJP");
               String sjis = new String(str.getBytes("SJIS"), 0, 9, "SJIS");

               System.out.println("length (via EUCJP) : " + eucjp.length());
               System.out.println("length (via SJIS) : " + sjis.length());
           }
      }

            mmcclosksunw Michael Mccloskey (Inactive)
            duke J. Duke
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: