-
Bug
-
Resolution: Not an Issue
-
P4
-
None
-
6
-
x86
-
windows_xp
FULL PRODUCT VERSION :
1.6.0_04
ADDITIONAL OS VERSION INFORMATION :
Windows XP SP2
A DESCRIPTION OF THE PROBLEM :
In the third panel of the intl.cpl control panel on Windows XP, you can set the default code page to use for "non-Unicode" programs.
For example, this controls (for all programs, including those that are fully Unicode internally) how a text file with no encoding information is to be interpreted.
Starting with Java 5, Charset.defaultCharset() relies on the system property "file.encoding" to return such a value.
Unfortunately it doesn't return the correct value which can be retrieved using the WIN32 API method GetACP().
Instead the implementation (GetJavaProperties) in java_props_md.c confuses locale with "default encoding" and tries to return a code page matching the default locale.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Set the "Language for non-Unicode programs" on a US Windows to Russian and reboot.
Open Notepad, paste some russian text into it and save as a (non-Unicode) .txt file. Reopen it in Notepad and see that it looks OK.
Now write a small Java app that reads the file using Charset.defaultCharset() and inspect the contents in Unicode.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Expected java code to behave as native windows applications do.
ACTUAL -
The russian code page 1251 characters are interpreted as being in the Windows-1252 character set.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
Charset cs = Charset.defaultCharset(); // Should match GetACP but doesn't
FileInputStream fis = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(fis, cs);
BufferedReader br = new BufferedReader(isr);
while (true) {
String s = br.readLine();
if (s == null) break;
System.out.println(s);
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Only workaround I have found is to write JNI code which calls GetACP and returns "cp" + the code page retrieved.
1.6.0_04
ADDITIONAL OS VERSION INFORMATION :
Windows XP SP2
A DESCRIPTION OF THE PROBLEM :
In the third panel of the intl.cpl control panel on Windows XP, you can set the default code page to use for "non-Unicode" programs.
For example, this controls (for all programs, including those that are fully Unicode internally) how a text file with no encoding information is to be interpreted.
Starting with Java 5, Charset.defaultCharset() relies on the system property "file.encoding" to return such a value.
Unfortunately it doesn't return the correct value which can be retrieved using the WIN32 API method GetACP().
Instead the implementation (GetJavaProperties) in java_props_md.c confuses locale with "default encoding" and tries to return a code page matching the default locale.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Set the "Language for non-Unicode programs" on a US Windows to Russian and reboot.
Open Notepad, paste some russian text into it and save as a (non-Unicode) .txt file. Reopen it in Notepad and see that it looks OK.
Now write a small Java app that reads the file using Charset.defaultCharset() and inspect the contents in Unicode.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Expected java code to behave as native windows applications do.
ACTUAL -
The russian code page 1251 characters are interpreted as being in the Windows-1252 character set.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
Charset cs = Charset.defaultCharset(); // Should match GetACP but doesn't
FileInputStream fis = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(fis, cs);
BufferedReader br = new BufferedReader(isr);
while (true) {
String s = br.readLine();
if (s == null) break;
System.out.println(s);
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Only workaround I have found is to write JNI code which calls GetACP and returns "cp" + the code page retrieved.