Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4279804

Display of Group 1 National characters in CP 850 doesn't work

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 1.1.7
    • core-libs



      Name: krT82822 Date: 10/09/99


      I am running on Windows NT 4.0 with Service Pack 4.

      java full version "JDK 1.1.7 IBM build n117p-19990618 (JIT enabled: ibmjitc)"

      First, I have a German properties file that is encoded in Code
      Page 850. I converted the file to Unicode by running
      native2ascii against the file, specifying an -encoding option of
      "Cp850". I have a program called DumpMsgs that reads the strings
      from the Unicode version of the file and displays them in the
      DOS window, which is Code Page 850. (I will append the source
      code for DumpMsgs at the end of this note.) To make things
      easier, I redirected the output of the DumpMsgs tool to a file.
      When I display the contents of the file in the DOS window, none
      of the German National characters display correctly. However,
      when I view the file using the Windows "Write" (or wordpad)
      editor, all of the characters display correctly. This implies
      that the Code Page of the strings output by the DumpMsgs tool is
      code page 1252 (or ISO8859-1). I would have expected the output
      to be in code page 850 since the original properties file was in
      code page 850 and I specified a native2ascii encoding of "Cp850".
      This particular problem happens to the National characters in
      all of the Group 1 SBCS (Single Byte Character Set) languages
      (German, French, Italian, Spanish, and Brazilian Portuguese).
      The interesting thing is that when I converted the source
      properties file to code page 1252 and then converted it to
      Unicode using a native2ascii -encoding option of "Cp1252", I got
      the same results. That is, the National characters did not
      display correctly in the DOS window, but did with the wordpad
      editor. However, since the source file and the native2ascii
      -encoding were both code page 1252, I would have expected this
      result.

      Here is a related, but slightly different problem: Since I want
      to have the contents of the properties file display correctly in
      the DOS window (code page 850), I experimented with different
      combinations of the source file encoding and the native2ascii
      -encoding option. The combination that worked the best was to
      have the source properties file in code page 850 but use a
      native2ascii -encoding option of "Cp1252". I don't know that
      this is a valid thing to do, but it seemed to work okay for the
      most part. All of the National characters displayed correctly
      in the DOS window for all of the Group 1 SBCS languages, with
      the exception of two German characters: the "u" with 2 dots
      above it and the capital 'A' with 2 dots above it. (I apologize
      that I don't know the official names of these characters.) All
      of the other German National characters (at least the ones that
      are used in my properties file) displayed correctly in the DOS
      window.

      I would like to see if there is a workaround or fix that will
      allow all the National characters for the Group 1 SBCS languages
      to display correctly in the DOS window (code page 850).

      Following is the source Java code for the DumpMsgs tool. The
      input to DumpMsgs is the name of the properties file whose
      strings you want to display. I pass in the name of the file
      that is output from the native2ascii command.

      ----------------- START DumpMsgs HERE -------------------

      import java.io.FileInputStream;

      import java.util.Enumeration;
      import java.util.Properties;

      public class DumpMsgs
      {
        public static void main (String[] args)
        {
          try
          {
            FileInputStream file = new FileInputStream (args[0]);
            Properties props = new Properties ();
            props.load (file);
            Enumeration list = props.propertyNames ();
            while (list.hasMoreElements ())
            {
              String key = (String)list.nextElement ();
              System.out.println ("key = " + key);
              System.out.println ("value = " + props.get (key));
              System.out.println ();
            }
          }
          catch (Exception e)
          {
            System.out.println (e);
          }
        } // main
      } // class DumpMsgs

      ----------------------

      10/1/99 more info from user (in reply to suggestion that bug # 4038677 might offer useful info):

      Thank you for your response. I apologize for not getting back to you sooner.
      Your first e-mail was buried in a bunch of other e-mail.

      I tried the workaround, but the result was the same.

      I left the source code in Code Page 850 and converted it to Unicode with the
      "Cp852" native2ascii encoding. When I used my DumpMsgs tool to print the file,
      it still displayed as if it were Code Page 1252 because all of the German
      National characters are corrupted in the DOS command window.

      I then tried to convert the source in Code Page 850 to Unicode with the Cp1250
      native2ascii encoding. This resulted in the same thing that I get with my
      partial workaround -- the u-umlat is still corrupted while the rest of the
      German National characters display correctly.

      The next thing I'll try is to convert the Code Page 850 source to Code Page 852
      using iconv (if I can). Then I will try to convert it to Unicode using the
      Cp852 native2ascii encoding. I'll let you know how that works.

      If you have any other suggestions, or if I didn't do the workaround correctly,
      please let me know.

      --------------

      10/6/99 from user:

      Converting the source file to CP852 (using iconv), then converting it to Unicode
      using a native2ascii encoding of "Cp852" did not work. The file output from
      native2ascii still is CP 1252. (I know this because the special characters in
      the file are displayed correctly in a Windows editor.)

      I still have not found a way to get all of the German special characters to
      display correctly in the DOS (CP 850) window.


      10/9/99 eval1127@eng -- am filing reference bug #
      (Review ID: 95816)
      ======================================================================

            nlindenbsunw Norbert Lindenberg (Inactive)
            kryansunw Kevin Ryan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: