Loading...

XML

Word

Printable

Type: Bug
Resolution: Not an Issue
Priority: P5
Fix Version/s: None
Affects Version/s: 5.0
Component/s: core-libs
Labels:
- webbug

Subcomponent:
java.nio.charsets
CPU:

x86
OS:

windows_xp

FULL PRODUCT VERSION :
java version "1.5.0_05"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05)
Java HotSpot(TM) Client VM (build 1.5.0_05-b05, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :

When I run the following test on a "MS949" encoded string I get an interesting result which appears to indicate that String.getBytes() does not properly interpret the encoding.

This indicates that in logical terms:
bytes != new String(bytes, "MS949").getBytes("MS949");

I used 4 korean characters:
B440(2);; # HANGUL SYLLABLE TIKEUT YO RIEULSIOS
B441(2);; # HANGUL SYLLABLE TIKEUT YO RIEULTHIEUTH
B442(2);; # HANGUL SYLLABLE TIKEUT YO RIEULPHIEUPH
B443(2);; # HANGUL SYLLABLE TIKEUT YO RIEULHIEUH

taken from http://www.iana.org/assignments/idn/kr-korean.html

MS949 is listed as the windows korean character set here:

http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
simply run the code I included above in the description

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I would expect that the two byte arrays would be equal as they are with default and ISO-8859-1charactersets
ACTUAL -
the byte array returned by String.getBytes("MS949") is not equal to the byte array submitted to new String(bytes, "MS949")

ERROR MESSAGES/STACK TRACES THAT OCCUR :
No errors are reported

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------

public void testStringParsing() {
    byte[] b = new byte[] {(byte)0xb4, (byte)0x40,
        (byte)0xb4, (byte)0x41,
        (byte)0xb4, (byte)0x42,
        (byte)0xb4, (byte)0x43};
    try {

      String[] charsets = new String[] {"ISO-8859-1", "MS949"};

      for (int i = 0; i < charsets.length; i++) {

        System.out.println("Using encoding "+charsets[i]+" for Korean characters.");

        String koreanEncodedString = new String(b, charsets[i]);
        String defaultEncodedString = new String(b);
        System.out.println("KOREAN 1: "+koreanEncodedString);
        System.out.println("DEFAULT 1: "+defaultEncodedString);

        byte[] koreanBytes = koreanEncodedString.getBytes(charsets[i]);
        byte[] defaultBytes = defaultEncodedString.getBytes();

        String stringFromKoreanBytes = new String(koreanBytes, charsets[i]);
        String stringFromDefaultBytes = new String(defaultBytes);

        System.out.println("KOREAN 2: "+stringFromKoreanBytes);
        System.out.println("DEFAULT 2: "+stringFromDefaultBytes);

        if (koreanEncodedString.equals(koreanEncodedString))
          System.out.println("Korean String 1 matches Korean String 1");
        else
          System.out.println("Korean String 1 does not match Korean String 1");


        if (koreanEncodedString.equals(stringFromKoreanBytes))
          System.out.println("Korean String 1 matches Korean String 2");
        else
          System.out.println("Korean String 1 does not match Korean String 2");

        if (defaultEncodedString.equals(stringFromDefaultBytes))
          System.out.println("Default String 1 matches Default String 2");
        else
          System.out.println("Default String 1 does not match Default String 2");


        StringBuffer sb = new StringBuffer();
        formatByteArray(b, 0, b.length, true, sb);
        System.out.println(sb.toString());

        StringBuffer sb2 = new StringBuffer();
        formatByteArray(koreanBytes, 0, koreanBytes.length, true, sb2);
        System.out.println(sb2.toString());
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }

  private static void formatByteArray(byte[] raw, int start, int length,
      boolean useSpace, StringBuffer result)
  {
    if (raw == null)
      return;
    for (int i = start; i < start + length; i++) {
      if (useSpace && i != start)
        result.append(" ");
      try {
        int b = raw[i] & 0xFF;
        if (b < 0x10)
          result.append("0");
        result.append(Integer.toHexString(b).toUpperCase());
      } catch (ArrayIndexOutOfBoundsException e) {
        result.append(" ");
      }
    }
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
I guess you could always force "ISO-8859-1" encoding

Assignee:: Unassigned

Reporter:: Nelson Dcosta (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2006-01-30 02:38

Updated:: 2011-02-16 11:15

Resolved:: 2006-01-30 08:22

Imported:: 15/Sep/12 1:21 PM

Indexed:: 17/Jul/12 10:53 AM

Details

Description

Attachments

Activity

People

Dates