Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4216191

Case sensitivity to character set encoding names

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 1.4.1
    • 1.2.0
    • core-libs
    • hopper
    • x86
    • windows_nt
    • Verified



      Name: dbT83986 Date: 03/01/99


      The apparent case sensitivity to character set encoding names
      is very annoying, and possibly incorrect. It is claimed that
      the accepted names are from the IANA charset names, but the
      IANA says that names are case insensitive (second paragraph of
      http://www.isi.edu/in-notes/iana/assignments/character-sets)

      Further more, it isn't clear which aliases to various character
      sets are supported. For example, the document referenced above
      gives the following entry for the ascii charset:

      Name: ANSI_X3.4-1968
      ...
      Alias: iso-ir-6
      Alias: ANSI_X3.4-1986
      Alias: ISO_646.irv:1991
      Alias: ASCII
      Alias: ISO646-US
      Alias: US-ASCII (preferred MIME name)
      Alias: us
      Alias: IBM367
      Alias: cp367
      Alias: csASCII

      But my tests show that only "ASCII" (but *not* "ascii") and "US-ASCII" (where "us-ascii" and even "US-ascii" *are* recognized) can be used to refer to the ASCII character set.

      Attached is a small program that tries various encoding names
      and prints out if they seem to legal or not. It tries various
      mixed case renderings and aliases for the US-ASCII, Unicode,
      Big5, and Cp1252 encodings. On my machine running Java 1.2
      I get the following output:

      Encoding "ANSI_X3.4-1968" NOT recognized
      Encoding "iso-ir-6" NOT recognized
      Encoding "ANSI_X3.4-1986" NOT recognized
      Encoding "ISO_646.irv:1991" NOT recognized
      Encoding "ASCII" recognized
      Encoding "ascii" NOT recognized
      Encoding "Ascii" NOT recognized
      Encoding "ISO646-US" NOT recognized
      Encoding "US-ASCII" recognized
      Encoding "us-ascii" recognized
      Encoding "US-Ascii" recognized
      Encoding "us" NOT recognized
      Encoding "IBM367" NOT recognized
      Encoding "cp367" NOT recognized
      Encoding "csASCII" NOT recognized
      Encoding "Unicode" recognized
      Encoding "UNICODE" NOT recognized
      Encoding "unicode" NOT recognized
      Encoding "Big5" recognized
      Encoding "big5" recognized
      Encoding "bIg5" recognized
      Encoding "biG5" recognized
      Encoding "bIG5" recognized
      Encoding "Cp1252" recognized
      Encoding "cp1252" NOT recognized
      Encoding "CP1252" NOT recognized


      The primary reason I find this annoying is when dealing with
      Transferables with the "text/plain" MIME type. I want to be
      able to create a InputStreamReader using the encoding
      described by the charset parameter of the MIME type. On my
      machine these always come back in lower case, so I get encoding
      names such as "ascii" and "unicode". Passing these strings to
      the InputStreamReader constructor results in an exception, so I
      have to change the strings to "ASCII" and "Unicode", and there
      doesn't seem to an easy way to know, in general, which letters
      need to be capitalized to make the encoding name acceptable.

      import java.lang.*;
      import java.io.*;

      public class EncodingsTest
      {
        public static void main(String args[])
        {
          // Try various encoding names in mixed cases

          // Various forms of US-ASCII
          tryToEncode( "ANSI_X3.4-1968" );
          tryToEncode( "iso-ir-6" );
          tryToEncode( "ANSI_X3.4-1986" );
          tryToEncode( "ISO_646.irv:1991" );
          tryToEncode( "ASCII" );
          tryToEncode( "ascii" );
          tryToEncode( "Ascii" );
          tryToEncode( "ISO646-US" );
          tryToEncode( "US-ASCII" );
          tryToEncode( "us-ascii" );
          tryToEncode( "US-Ascii" );
          tryToEncode( "us" );
          tryToEncode( "IBM367" );
          tryToEncode( "cp367" );
          tryToEncode( "csASCII" );

          // Variants on Unicode
          tryToEncode( "Unicode" );
          tryToEncode( "UNICODE" );
          tryToEncode( "unicode" );

          // Variants on Big5
          tryToEncode( "Big5" );
          tryToEncode( "big5" );
          tryToEncode( "bIg5" );
          tryToEncode( "biG5" );
          tryToEncode( "bIG5" );

          // Variants of Cp1252
          tryToEncode( "Cp1252" );
          tryToEncode( "cp1252" );
          tryToEncode( "CP1252" );
        }


        public static final String ENCODE_STRING = "Encode me";

        public static void tryToEncode( String encoding )
        {
          try
          {
            byte[] bytes = ENCODE_STRING.getBytes( encoding );
            System.out.println( "Encoding \"" + encoding + "\" recognized" );
          }
          catch( UnsupportedEncodingException e )
          {
            System.out.println( "Encoding \"" + encoding + "\" NOT recognized" );
          }
        }
      }
      (Review ID: 54372)
      ======================================================================

            ilittlesunw Ian Little (Inactive)
            dblairsunw Dave Blair (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: