Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6969285

(cs) Corrupted decoding of UTF16 little endian file after append after eof.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • 6u10
    • core-libs

      FULL PRODUCT VERSION :
      java version "1.6.0_20"
      Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
      Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)

      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows XP [Version 5.1.2600]

      A DESCRIPTION OF THE PROBLEM :
      If one attempts to read a UTF-16 file little endian with BOM as it is being written to, that is, read until InputStreamReader.read() returns -1, wait for a while and read some more, the characters returned after the first eof are not decoded properly. The bytes in the characters are reversed as if the InputStreamReader thinks the file is UTF-16 big endian.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      See test case

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      After the first eof, the characters read should have been "Bye now"
      ACTUAL -
      Each character had its bytes in the wrong order.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      package tests.learning;

      import org.junit.AfterClass;
      import org.junit.Before;
      import org.junit.BeforeClass;
      import org.junit.Test;
      import static org.junit.Assert.*;

      public class Utf16LittleEndianIoTest implements Runnable {
          private static java.io.File tstFile;
          private java.util.concurrent.Semaphore writerWait;
          private java.util.concurrent.Semaphore readerWait;

          @BeforeClass
          public static void setUpClass() throws Exception {
              tstFile = new java.io.File("Utf16LittleEndianIoTest.txt");
          }

          @AfterClass
          public static void tearDownClass() throws Exception {
              tstFile.delete();
          }

          @Before
          public void setUp() {
              writerWait = new java.util.concurrent.Semaphore(0);
              readerWait = new java.util.concurrent.Semaphore(0);
          }

          /**
           * Simulate the reading of a file as it is being written to.
           * Read to the end of the file. Wait until more is written to the
           * file and then read that.
           *
           * One would hope that the file system would remember that the file
           * is UTF-16 little endian. When the reading continues, I'm seeing
           * characters coming in with the bytes reversed as though the file
           * were UTF-16 big endian
           *
           * @throws Exception
           */
          @Test
          public void tstIo() throws Exception {
              java.io.InputStream is = null;
              java.io.InputStreamReader isr = null;

              Thread th = new Thread(this);
              th.start();
              readerWait.acquire();

              try {
                  is = new java.io.FileInputStream(tstFile);
                  is = new java.io.BufferedInputStream(is);
                  isr = new java.io.InputStreamReader(is, "UTF-16");

                  int x = -1;
                  StringBuilder sb = new StringBuilder();

                  while (-1 != (x = isr.read()))
                      sb.append((char)x);

                  String actual = sb.toString();
                  System.out.println(actual);
                  assertEquals("Hi there", actual);

                  writerWait.release();
                  th.join();

                  sb.setLength(0);
                  while (-1 != (x = isr.read()))
                      sb.append((char)x);

                  StringBuilder sbHuh = new StringBuilder(sb.length());
                  final int length = sb.length();
                  for (int i = 0; i < length; ++i) {
                      char c = sb.charAt(i);

                      // set d to c's value with the bytes reversed
                      char d = (char)(((c & 0xff) << 8) | ((c & 0xff00) >> 8));
                      sbHuh.append(d);
                  }

                  if ("Bye now".equals(sbHuh.toString()))
                      System.out.println("The bytes are reversed");

                  actual = sb.toString();
                  System.out.println(actual);
                  assertEquals("Bye now", actual);
              }
              finally {
                  if (null != isr)
                      isr.close();
                  else if (null != is)
                      is.close();
              }
          }

          public void run() {
              java.io.OutputStream os = null;
              
              try {
                  os = new java.io.FileOutputStream(tstFile);
                  os.write(new byte[] {
                      // Byte order mark (BOM) - little endian
                      (byte)0xFF, (byte)0xFE,
                      // Hi there
                      0x48, 0x00, 0x69, 0x00, 0x20, 0x00,
                      0x74, 0x00, 0x68, 0x00, 0x65, 0x00, 0x72, 0x00, 0x65, 0x00
                  });
                
                  readerWait.release();
                  writerWait.acquire();

                  os.write(new byte[] {
                      // Bye now
                      0x42, 0x00, 0x79, 0x00, 0x65, 0x00, 0x20, 0x00,
                      0x6e, 0x00, 0x6f, 0x00, 0x77, 0x00
                  });
              }
              catch (Exception e) {
                  throw new RuntimeException(e);
              }
              finally {
                  if (null != os) {
                      try {
                          os.close();
                      }
                      catch (Exception ignore) {}
                  }
              }
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      I guess I'll reverse the bytes of each character after I've first reached the end of file.

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: