Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6258039

String.getBytes returns a multibyte '?' when unicode to euc_jp is not possible.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: P4 P4
    • None
    • 1.4.2
    • core-libs

      All versions of 1.4.2 on all platforms incorrectly returns
      a multibyte values for '?' (replacement character) when a
      unicode can not be mapped to a character in euc_jp character set.

      The following program demonstrates the problem.

      import java.nio.charset.*;
      class CharTest
      {
        public static void main(String[] args)
        {
          try {
            String unicode = "\u2015";
            byte[] bytes = unicode.getBytes("EUC_JP");
            for (int i=0 ; i<bytes.length; i++) {
              System.out.println("0x"+Integer.toHexString(0xff &(int)bytes
      [i]));
          }

          Charset charset = Charset.forName("EUC_JP");
          System.out.println("charset EUC_JP == "+charset.displayName());

          CharsetEncoder encoder = charset.newEncoder();
          System.out.println(encoder.toString());
          bytes = encoder.replacement();
          System.out.println("replacement Value");
          for (int i=0 ; i<bytes.length; i++) {
            System.out.println("0x"+Integer.toHexString(0xff &(int)bytes
      [i]));
          }

          }catch (Exception cce) {
            cce.printStackTrace();
          }
        }
      }

      1.4.2 output
      ------------
      0x21
      0x29
      charset EUC_JP == EUC-JP
      sun.nio.cs.ext.EUC_JP$Encoder@1f12c4e
      replacement Value
      0x21
      0x29

      Other versions
      --------------
      0x3f
      charset EUC_JP == EUC-JP
      sun.nio.cs.ext.EUC_JP$Encoder@a8c4e7
      replacement Value
      0x3f


      The problem existed in one of the earlier of 5.0 Javasoft drops, but
      was fixed prior to the FCS 5.0 drop.
      Here is the comment in the 5.0 file that contains the fix.

      share/classes/sun/nio/cs/ext/EUC_JP.java:

          public CharsetEncoder newEncoder() {

              // Need to force the replacement byte to 0x3f
              // because JIS_X_0208_Encoder defines its own
              // alternative 2 byte substitution to permit it
              // to exist as a self-standing Encoder

              byte[] replacementBytes = { (byte)0x3f };
              return new Encoder(this).replaceWith(replacementBytes);
          }
      ###@###.### 2005-04-19 23:17:28 GMT

            sherman Xueming Shen
            ksoshals Kirill Soshalskiy (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: