-
Bug
-
Resolution: Not an Issue
-
P4
-
None
-
7u79
-
x86_64
-
windows_7
FULL PRODUCT VERSION :
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7600]
A DESCRIPTION OF THE PROBLEM :
I have a file in UTF-8 encoding, which contains different Japanese characters. But this problem ONLY occurs with Kanji sub-set of characters.
For instance:
I have a file with 2 Japanese characters:
用裵
First one isn't Kanji, second one - is.
File encoding is UTF-8.
I'm reading this file with the following code (simplified version):
// filePath is pointing to a place in a file system
File file = new File(filePath);
FileInputStream input = new FileInputStream(file);
BufferedReader input = BufferedReader (new InputStreamReader(input, "UTF-8"));
// and then I want to return it to the user in Shift-JIS encoding:
// response is an object of class HttpServletResponse
OutputStreamWriter output = new OutputStreamWriter(response.getOutputStream(), "Shift-JIS");
// customizing response to make it able to serve file to a user
response.setBufferSize(3 * 1024);
response.setContentType("application/csv");
response.addHeader("Content-Disposition", "attachment; fileName=file.csv");
response.setContentLength((int) file.length());
int len = 80;
char buffer[] = new char[len];
int numRead;
while ((numRead = input.read(buffer, 0, len)) != -1) {
output.write(buffer, 0, numRead);
}
// then closing readers etc..
After this process I'm getting the file with Shift-JIS encoding. All non-Kanji Japanese characters are stored as expected. But all Kanji chars stored incorrectly.
For characters mentioned above:
Hexademical representation of "用裵" in UTF-8 file:
E7 94 A8 E8 A3 B5
E7 94 A8 - 用 in UTF-8.
E8 A3 B5 - 裵 in UTF-8.
Hexademical representation of "用裵" in Shift-JIS file after conversion:
97 70 3F
97 70 - 用 in Shift-JIS.
3F - ? in Shift-JIS.
ADDITIONAL REGRESSION INFORMATION:
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Please, see description.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I'm expecting the following hex. representation of characters "用裵" in Shift-JIS:
97 70 EE 86
ACTUAL -
97 70 3F
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.*;
class UtfToShiftJIS {
// CHANGE THESE CONSTANTS PLEASE
// INPUT_PATH SHOULD POINT TO A REAL FILE WITH KANJI CHARS SAVED IN UTF-8
private final static String INPUT_PATH = "D:\\input.csv";
// OUTPUT_PATH SHOULD POINT TO RESULT FILE
private final static String OUTPUT_PATH = "D:\\output.csv";
public static void main(String[]args) throws IOException {
File inputFile = new File(INPUT_PATH);
FileInputStream inputStream = new FileInputStream(inputFile);
BufferedReader input = new BufferedReader (new InputStreamReader(inputStream, "UTF-8"));
File outputFile = new File(OUTPUT_PATH);
FileOutputStream outputStream = new FileOutputStream(outputFile);
OutputStreamWriter output = new OutputStreamWriter(outputStream, "Shift-JIS");
int len = 80;
char buffer[] = new char[len];
int numRead;
while ((numRead = input.read(buffer, 0, len)) != -1) {
output.write(buffer, 0, numRead);
}
output.close();
outputStream.close();
input.close();
inputStream.close();
}
}
---------- END SOURCE ----------
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7600]
A DESCRIPTION OF THE PROBLEM :
I have a file in UTF-8 encoding, which contains different Japanese characters. But this problem ONLY occurs with Kanji sub-set of characters.
For instance:
I have a file with 2 Japanese characters:
用裵
First one isn't Kanji, second one - is.
File encoding is UTF-8.
I'm reading this file with the following code (simplified version):
// filePath is pointing to a place in a file system
File file = new File(filePath);
FileInputStream input = new FileInputStream(file);
BufferedReader input = BufferedReader (new InputStreamReader(input, "UTF-8"));
// and then I want to return it to the user in Shift-JIS encoding:
// response is an object of class HttpServletResponse
OutputStreamWriter output = new OutputStreamWriter(response.getOutputStream(), "Shift-JIS");
// customizing response to make it able to serve file to a user
response.setBufferSize(3 * 1024);
response.setContentType("application/csv");
response.addHeader("Content-Disposition", "attachment; fileName=file.csv");
response.setContentLength((int) file.length());
int len = 80;
char buffer[] = new char[len];
int numRead;
while ((numRead = input.read(buffer, 0, len)) != -1) {
output.write(buffer, 0, numRead);
}
// then closing readers etc..
After this process I'm getting the file with Shift-JIS encoding. All non-Kanji Japanese characters are stored as expected. But all Kanji chars stored incorrectly.
For characters mentioned above:
Hexademical representation of "用裵" in UTF-8 file:
E7 94 A8 E8 A3 B5
E7 94 A8 - 用 in UTF-8.
E8 A3 B5 - 裵 in UTF-8.
Hexademical representation of "用裵" in Shift-JIS file after conversion:
97 70 3F
97 70 - 用 in Shift-JIS.
3F - ? in Shift-JIS.
ADDITIONAL REGRESSION INFORMATION:
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Please, see description.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I'm expecting the following hex. representation of characters "用裵" in Shift-JIS:
97 70 EE 86
ACTUAL -
97 70 3F
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.*;
class UtfToShiftJIS {
// CHANGE THESE CONSTANTS PLEASE
// INPUT_PATH SHOULD POINT TO A REAL FILE WITH KANJI CHARS SAVED IN UTF-8
private final static String INPUT_PATH = "D:\\input.csv";
// OUTPUT_PATH SHOULD POINT TO RESULT FILE
private final static String OUTPUT_PATH = "D:\\output.csv";
public static void main(String[]args) throws IOException {
File inputFile = new File(INPUT_PATH);
FileInputStream inputStream = new FileInputStream(inputFile);
BufferedReader input = new BufferedReader (new InputStreamReader(inputStream, "UTF-8"));
File outputFile = new File(OUTPUT_PATH);
FileOutputStream outputStream = new FileOutputStream(outputFile);
OutputStreamWriter output = new OutputStreamWriter(outputStream, "Shift-JIS");
int len = 80;
char buffer[] = new char[len];
int numRead;
while ((numRead = input.read(buffer, 0, len)) != -1) {
output.write(buffer, 0, numRead);
}
output.close();
outputStream.close();
input.close();
inputStream.close();
}
}
---------- END SOURCE ----------