Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8132689

Converting Japanese Kanji characters from UTF-8 to Shift-JIS

XMLWordPrintable

      FULL PRODUCT VERSION :
      java version "1.7.0"
      Java(TM) SE Runtime Environment (build 1.7.0-b147)
      Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows [Version 6.1.7600]

      A DESCRIPTION OF THE PROBLEM :
      I have a file in UTF-8 encoding, which contains different Japanese characters. But this problem ONLY occurs with Kanji sub-set of characters.

      For instance:
      I have a file with 2 Japanese characters:
      用裵
      First one isn't Kanji, second one - is.
      File encoding is UTF-8.

      I'm reading this file with the following code (simplified version):

      // filePath is pointing to a place in a file system
      File file = new File(filePath);
      FileInputStream input = new FileInputStream(file);
      BufferedReader input = BufferedReader (new InputStreamReader(input, "UTF-8"));

      // and then I want to return it to the user in Shift-JIS encoding:

      // response is an object of class HttpServletResponse
      OutputStreamWriter output = new OutputStreamWriter(response.getOutputStream(), "Shift-JIS");

      // customizing response to make it able to serve file to a user
      response.setBufferSize(3 * 1024);
      response.setContentType("application/csv");
      response.addHeader("Content-Disposition", "attachment; fileName=file.csv");
      response.setContentLength((int) file.length());

      int len = 80;
      char buffer[] = new char[len];
      int numRead;
      while ((numRead = input.read(buffer, 0, len)) != -1) {
          output.write(buffer, 0, numRead);
      }

      // then closing readers etc..

      After this process I'm getting the file with Shift-JIS encoding. All non-Kanji Japanese characters are stored as expected. But all Kanji chars stored incorrectly.
      For characters mentioned above:
      Hexademical representation of "用裵" in UTF-8 file:
      E7 94 A8 E8 A3 B5

      E7 94 A8 - 用 in UTF-8.
      E8 A3 B5 - 裵 in UTF-8.

      Hexademical representation of "用裵" in Shift-JIS file after conversion:
      97 70 3F

      97 70 - 用 in Shift-JIS.
      3F - ? in Shift-JIS.



      ADDITIONAL REGRESSION INFORMATION:
      java version "1.7.0"
      Java(TM) SE Runtime Environment (build 1.7.0-b147)
      Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Please, see description.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      I'm expecting the following hex. representation of characters "用裵" in Shift-JIS:
      97 70 EE 86
      ACTUAL -
      97 70 3F

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.*;

      class UtfToShiftJIS {

          // CHANGE THESE CONSTANTS PLEASE
          // INPUT_PATH SHOULD POINT TO A REAL FILE WITH KANJI CHARS SAVED IN UTF-8
          private final static String INPUT_PATH = "D:\\input.csv";

          // OUTPUT_PATH SHOULD POINT TO RESULT FILE
          private final static String OUTPUT_PATH = "D:\\output.csv";

          public static void main(String[]args) throws IOException {
              File inputFile = new File(INPUT_PATH);
              FileInputStream inputStream = new FileInputStream(inputFile);
              BufferedReader input = new BufferedReader (new InputStreamReader(inputStream, "UTF-8"));

              File outputFile = new File(OUTPUT_PATH);
              FileOutputStream outputStream = new FileOutputStream(outputFile);
              OutputStreamWriter output = new OutputStreamWriter(outputStream, "Shift-JIS");

              int len = 80;
              char buffer[] = new char[len];
              int numRead;
              while ((numRead = input.read(buffer, 0, len)) != -1) {
                  output.write(buffer, 0, numRead);
              }

              output.close();
              outputStream.close();

              input.close();
              inputStream.close();
          }

      }

      ---------- END SOURCE ----------

        1. UtfToShiftJIS.java
          1 kB
        2. output.csv
          0.0 kB
        3. input.csv
          0.0 kB

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: