Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6676635

Charset.defaultCharset() doesn't return the default charset on Windows

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P4 P4
    • None
    • 6
    • core-libs

      FULL PRODUCT VERSION :
      1.6.0_04

      ADDITIONAL OS VERSION INFORMATION :
      Windows XP SP2

      A DESCRIPTION OF THE PROBLEM :
      In the third panel of the intl.cpl control panel on Windows XP, you can set the default code page to use for "non-Unicode" programs.
      For example, this controls (for all programs, including those that are fully Unicode internally) how a text file with no encoding information is to be interpreted.
      Starting with Java 5, Charset.defaultCharset() relies on the system property "file.encoding" to return such a value.
      Unfortunately it doesn't return the correct value which can be retrieved using the WIN32 API method GetACP().
      Instead the implementation (GetJavaProperties) in java_props_md.c confuses locale with "default encoding" and tries to return a code page matching the default locale.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Set the "Language for non-Unicode programs" on a US Windows to Russian and reboot.

      Open Notepad, paste some russian text into it and save as a (non-Unicode) .txt file. Reopen it in Notepad and see that it looks OK.

      Now write a small Java app that reads the file using Charset.defaultCharset() and inspect the contents in Unicode.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Expected java code to behave as native windows applications do.
      ACTUAL -
      The russian code page 1251 characters are interpreted as being in the Windows-1252 character set.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      Charset cs = Charset.defaultCharset(); // Should match GetACP but doesn't
      FileInputStream fis = new FileInputStream(fileName);
      InputStreamReader isr = new InputStreamReader(fis, cs);
      BufferedReader br = new BufferedReader(isr);
      while (true) {
      String s = br.readLine();
      if (s == null) break;
      System.out.println(s);
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Only workaround I have found is to write JNI code which calls GetACP and returns "cp" + the code page retrieved.

            Unassigned Unassigned
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: