Loading...

XML

Word

Printable

Type: Bug
Resolution: Not an Issue
Priority: P4
Fix Version/s: None
Affects Version/s: 6
Component/s: core-libs
Labels:
- webbug

Subcomponent:
java.nio.charsets
CPU:

x86
OS:

windows_xp

FULL PRODUCT VERSION :
1.6.0_04

ADDITIONAL OS VERSION INFORMATION :
Windows XP SP2

A DESCRIPTION OF THE PROBLEM :
In the third panel of the intl.cpl control panel on Windows XP, you can set the default code page to use for "non-Unicode" programs.
For example, this controls (for all programs, including those that are fully Unicode internally) how a text file with no encoding information is to be interpreted.
Starting with Java 5, Charset.defaultCharset() relies on the system property "file.encoding" to return such a value.
Unfortunately it doesn't return the correct value which can be retrieved using the WIN32 API method GetACP().
Instead the implementation (GetJavaProperties) in java_props_md.c confuses locale with "default encoding" and tries to return a code page matching the default locale.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Set the "Language for non-Unicode programs" on a US Windows to Russian and reboot.

Open Notepad, paste some russian text into it and save as a (non-Unicode) .txt file. Reopen it in Notepad and see that it looks OK.

Now write a small Java app that reads the file using Charset.defaultCharset() and inspect the contents in Unicode.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Expected java code to behave as native windows applications do.
ACTUAL -
The russian code page 1251 characters are interpreted as being in the Windows-1252 character set.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
Charset cs = Charset.defaultCharset(); // Should match GetACP but doesn't
FileInputStream fis = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(fis, cs);
BufferedReader br = new BufferedReader(isr);
while (true) {
String s = br.readLine();
if (s == null) break;
System.out.println(s);
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Only workaround I have found is to write JNI code which calls GetACP and returns "cp" + the code page retrieved.

Assignee:: Unassigned

Reporter:: Nelson Dcosta (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2008-03-18 05:26

Updated:: 2011-02-16 11:15

Resolved:: 2008-06-30 14:34

Imported:: 15/Sep/12 1:24 PM

Indexed:: 17/Jul/12 10:56 AM

Details

Description

Attachments

Activity

People

Dates