Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8023748

Any UTF-8 files cannot be read correctly if it contains single quotation

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P3 P3
    • None
    • 7u10, 7u21
    • core-libs

      FULL PRODUCT VERSION :
      1)Eclipse Java EE IDE for Web Developers.
      Version: Juno Service Release 2
      Build id: 20130225-0426
      2)Java SE Runtime Environment 7
      I've installed all the followings:
      jre-7u21-windows-i586.exe
      jre-7u21-windows-x64.exe
      3)GlassFish --> Java Platform, Enterprise Edition 6 SDK Update 4 (with JDK 7u10)
      java_ee_sdk-6u4-jdk7-windows-x64-ml.exe


      ADDITIONAL OS VERSION INFORMATION :
      Windows 7 Professional
      Service Pack 1
      64-bit version

      A DESCRIPTION OF THE PROBLEM :
      Any file cannot be read correctly only when the file is encoded in UTF-8 format and has a single quotation at the first column in the first line.
      I mean, if a file named "abc.vb" contains like the following, some kind of strange code was inserted to the top of the content.

      The file content of "abc.vb" (encoded in UTF-8):
      '@ ******************************************************************************
      '@ Company : Hoge Co.,Ltd
      '@ ------------------------------------------------------------------------------
      '@ MODIFICATION HISTORY
      '@ When Who Version Why
      '@ ------------------------------------------------------------------------------
      '@ 2012/02/29 Hoge 1.0 Created
      '@ ******************************************************************************
      Imports System.Web.Services
      Imports System.Web.Services.Protocols
      Imports System.ComponentModel


      And the string which can be obtained from InputStream (or whatever) looks like:
      ?'@ ******************************************************************************
      '@ Company : Komatsu Rental Co.,Ltd
      '@ ------------------------------------------------------------------------------
      '@ MODIFICATION HISTORY
      '@ When Who Version Why
      '@ ------------------------------------------------------------------------------
      '@ 2012/02/29 Yukako.T(FCS) 1.0 Created
      '@ ******************************************************************************
      Imports System.Web.Services
      Imports System.Web.Services.Protocols
      Imports System.ComponentModel


      The [?] is the strange code I'm talking about.
      I cannot describe the strange code inserted at the top of the content, since I don't know what that is.
      All I want to read some VB.NET source (whose commets begin with single quotations) and display onto the HTML page.

      I have tried 6 different approaches to get the content of the file, but all the approaches give me the same problem:

      <method 1>
      InputStream is = new FileInputStream(filename);
      StringWriter writer = new StringWriter();
      IOUtils.copy(is, writer, "UTF-8");
      sourceCodeString = writer.toString();

      <method 2>
      File fileDir = new File(filename);
      BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileDir), "UTF-8"));
      String str;
      StringBuffer buffer = new StringBuffer();
      int num = 0;
      while ((str = in.readLine()) != null) {
          buffer.append(str);
      }
      in.close();
      sourceCodeString = buffer.toString();

      <method 3>
      InputStream is = new FileInputStream(filename);
      ByteArrayOutputStream buffer = new ByteArrayOutputStream();
      int nRead;
      byte[] data = new byte[16384];
      while ((nRead = is.read(data, 0, data.length)) != -1) {
        buffer.write(data, 0, nRead);
      }
      buffer.flush();
      sourceCodeString = buffer.toString("UTF-8");

      <method 4>
      FileInputStream fis = new FileInputStream(filename);
      sourceCodeString = new Scanner(fis, "UTF-8").useDelimiter("\\A").next();

      <method 5>
      FileInputStream fis = new FileInputStream(filename);
      sourceCodeString = IOUtils.toString(fis, "UTF-8");

      <method 6>
      FileInputStream fis = new FileInputStream(filename);
      sourceCodeString = CharStreams.toString(new InputStreamReader(fis, "UTF-8"));

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1) Prepare any UTF-8 files that contains a single quotation at the first column of the first line.

      2) Just try to read the file.
      I've tried the following methods and all of them give me the same problem:

      <method 1>
      InputStream is = new FileInputStream(filename);
      StringWriter writer = new StringWriter();
      IOUtils.copy(is, writer, "UTF-8");
      sourceCodeString = writer.toString();

      <method 2>
      File fileDir = new File(filename);
      BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileDir), "UTF-8"));
      String str;
      StringBuffer buffer = new StringBuffer();
      int num = 0;
      while ((str = in.readLine()) != null) {
          buffer.append(str);
      }
      in.close();
      sourceCodeString = buffer.toString();

      <method 3>
      InputStream is = new FileInputStream(filename);
      ByteArrayOutputStream buffer = new ByteArrayOutputStream();
      int nRead;
      byte[] data = new byte[16384];
      while ((nRead = is.read(data, 0, data.length)) != -1) {
        buffer.write(data, 0, nRead);
      }
      buffer.flush();
      sourceCodeString = buffer.toString("UTF-8");

      <method 4>
      FileInputStream fis = new FileInputStream(filename);
      sourceCodeString = new Scanner(fis, "UTF-8").useDelimiter("\\A").next();

      <method 5>
      FileInputStream fis = new FileInputStream(filename);
      sourceCodeString = IOUtils.toString(fis, "UTF-8");

      <method 6>
      FileInputStream fis = new FileInputStream(filename);
      sourceCodeString = CharStreams.toString(new InputStreamReader(fis, "UTF-8"));

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The correct content of the file should be obtained.
      I mean, the strange code should not be inserted at the top of the content, even if a single quote is included at the top of the content of the file.

      I'm expecting the result like this:
      '@ ******************************************************************************
      '@ Company : Hoge Co.,Ltd
      '@ ------------------------------------------------------------------------------
      '@ MODIFICATION HISTORY
      '@ When Who Version Why
      '@ ------------------------------------------------------------------------------
      '@ 2012/02/29 Hoge 1.0 Created
      '@ ******************************************************************************
      Imports System.Web.Services
      Imports System.Web.Services.Protocols
      Imports System.ComponentModel

      ACTUAL -
      ?'@ ******************************************************************************
      '@ Company : Komatsu Rental Co.,Ltd
      '@ ------------------------------------------------------------------------------
      '@ MODIFICATION HISTORY
      '@ When Who Version Why
      '@ ------------------------------------------------------------------------------
      '@ 2012/02/29 Yukako.T(FCS) 1.0 Created
      '@ ******************************************************************************
      Imports System.Web.Services
      Imports System.Web.Services.Protocols
      Imports System.ComponentModel


      [?] is the strange code.

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      No errors occur.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      <method 1>
      InputStream is = new FileInputStream(filename);
      StringWriter writer = new StringWriter();
      IOUtils.copy(is, writer, "UTF-8");
      sourceCodeString = writer.toString();

      <method 2>
      File fileDir = new File(filename);
      BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileDir), "UTF-8"));
      String str;
      StringBuffer buffer = new StringBuffer();
      int num = 0;
      while ((str = in.readLine()) != null) {
          buffer.append(str);
      }
      in.close();
      sourceCodeString = buffer.toString();

      <method 3>
      InputStream is = new FileInputStream(filename);
      ByteArrayOutputStream buffer = new ByteArrayOutputStream();
      int nRead;
      byte[] data = new byte[16384];
      while ((nRead = is.read(data, 0, data.length)) != -1) {
        buffer.write(data, 0, nRead);
      }
      buffer.flush();
      sourceCodeString = buffer.toString("UTF-8");

      <method 4>
      FileInputStream fis = new FileInputStream(filename);
      sourceCodeString = new Scanner(fis, "UTF-8").useDelimiter("\\A").next();

      <method 5>
      FileInputStream fis = new FileInputStream(filename);
      sourceCodeString = IOUtils.toString(fis, "UTF-8");

      <method 6>
      FileInputStream fis = new FileInputStream(filename);
      sourceCodeString = CharStreams.toString(new InputStreamReader(fis, "UTF-8"));
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      I don't. Sorry...

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: