Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4912159

(cs) Stream encoder handles split malformed surrogates incorrectly

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P3 P3
    • None
    • 1.4.2
    • core-libs
    • x86
    • linux, windows_2000

      Name: rmT116609 Date: 08/25/2003


      FULL PRODUCT VERSION :
      java version "1.4.1_04"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_04-b01)
      Java HotSpot(TM) Client VM (build 1.4.1_04-b01, mixed mode)

      java version "1.4.2_01"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_01-b06)
      Java HotSpot(TM) Client VM (build 1.4.2_01-b06, mixed mode)

      FULL OS VERSION :
      Microsoft Windows 2000 [Version 5.00.2195]

      A DESCRIPTION OF THE PROBLEM :
      java.lang.Error received while attempting to convert a String with an invalid surrogate pair to a UTF-8 byte array using an OutputStreamWriter. The invalid surrogate pair is "broken" across two write operations. (See program segment in "Source code for an executable test case".)

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Run sample program under J2SDK 1.4.1.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The output expected is (as produced under JDK 1.3.1_08):

      Expected: 6162636465666768696a6b6c6d6e6f3f3f70717273747576
      Writing: 00610062006300640065006600670068 "abcdefgh"
      Captured:6162636465666768
      Writing: 0069006a006b006c006d006e006fd800 "ijklmno?"
      Captured:696a6b6c6d6e6f3f
      Writing: d8ff0070007100720073007400750076 "?pqrstuv"
      Captured:3f70717273747576


      Given the lookahead necessary to process well-formed surrogate pairs, the last several lines could (or should) have been:

      Writing: 0069006a006b006c006d006e006fd800 "ijklmno?"
      Captured:696a6b6c6d6e6f
      Writing: d8ff0070007100720073007400750076 "?pqrstuv"
      Captured:3f3f70717273747576



      ACTUAL -
      The output received was (as produced under J2SDK 1.4.1_04):

      Expected: 6162636465666768696a6b6c6d6e6f3f3f70717273747576
      Writing: 00610062006300640065006600670068 "abcdefgh"
      Captured:6162636465666768
      Writing: 0069006a006b006c006d006e006fd800 "ijklmno?"
      Captured:696a6b6c6d6e6f
      Writing: d8ff0070007100720073007400750076 "?pqrstuv"
      java.lang.Error
      at sun.nio.cs.StreamEncoder$CharsetSE.flushLeftoverChar(StreamEncoder.java:361)
      at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java:381)
      at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
      at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:146)
      at java.io.OutputStreamWriter.write(OutputStreamWriter.java:204)
      at java.io.Writer.write(Writer.java:126)
      at TryCharsetConversionError.main(TryCharsetConversionError.java:54)


      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      java.lang.Error
      at sun.nio.cs.StreamEncoder$CharsetSE.flushLeftoverChar(StreamEncoder.java:361)
      at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java:381)
      at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
      at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:146)
      at java.io.OutputStreamWriter.write(OutputStreamWriter.java:204)
      at java.io.Writer.write(Writer.java:126)
      at TryCharsetConversionError.main(TryCharsetConversionError.java:54)


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.OutputStreamWriter;
      import java.io.ByteArrayOutputStream;
      import java.io.UnsupportedEncodingException;
      import java.io.IOException;

      /**
       * The <code>TryCharsetConversionError</code> class
       * provides a simple case exposing an error in J2SE 1.4
       * handling of UTF-8 conversions with "invalid" surrogate
       * pairs using an <code>OutputStreamWriter</code>.
       */
      public class TryCharsetConversionError
      {
          public static void main(String[] args)
          {
              String encoding = "UTF8";

              String strings[] = { "abcdefgh",
                                 "ijklmno\uD800",
                                 "\uD8FFpqrstuv" };

              /*
               * First, convert the full string.
               */
              StringBuffer sb = new StringBuffer();
              for ( int i = 0; i < strings.length; i++ )
                  sb.append(strings[i]);
              String expected = sb.toString();
              try
              {
                  System.out.println("Expected: "
                                     + dump(expected.getBytes(encoding)));
              }
              catch (UnsupportedEncodingException e)
              {
                  e.printStackTrace(System.out);
                  System.exit(3);
              }

              /*
               * Now convert the string using a stream approach.
               */
              ByteArrayOutputStream baos = new ByteArrayOutputStream(256);
              try
              {
                  OutputStreamWriter osw =
                          new OutputStreamWriter(baos, encoding);

                  for ( int i = 0; i < strings.length; i++ )
                  {
                      String s = strings[i];
                      System.out.println("Writing: "
                                         + dump(s) + " \"" + s + "\"");
                      osw.write(s);
                      osw.flush();
                      System.out.println("Captured:"
                                         + dump(baos.toByteArray()));
                      baos.reset();
                  }
              }
              catch (UnsupportedEncodingException e)
              {
                  e.printStackTrace(System.out);
                  System.exit(4);
              }
              catch (IOException e)
              {
                  e.printStackTrace(System.out);
                  System.exit(5);
              }

          }

          private static String dump(String str)
          {
              byte[] bytes = new byte[str.length() * 2];
              for ( int i = 0, j = 0; i < str.length(); i++, j += 2 )
              {
                  char c = str.charAt(i);
                  bytes[j] = (byte) (c >>> 8);
                  bytes[j + 1] = (byte) (c);
              }

              return dump(bytes);
          }

          private static String dump(byte[] bytes)
          {
              StringBuffer sb = new StringBuffer();

              for ( int i = 0; i < bytes.length; i++ )
              {
                  sb.append(Integer.toHexString(bytes[i]
                                                & 0xFF | 0x100).substring(1));
              }

              return sb.toString();
          }

      }


      ---------- END SOURCE ----------

      Release Regression From : 1.3.1_09
      The above release value was the last known release where this
      bug was known to work. Since then there has been a regression.

      (Incident Review ID: 200528)
      ======================================================================

            sherman Xueming Shen
            rmandalasunw Ranjith Mandala (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: