Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4821286

UTF-16: CharsetEncoder.isLegalReplacement(byte[]) returns wrong value

XMLWordPrintable

    • b32
    • generic, sparc
    • generic, solaris_8
    • Verified

      Name: auR10023 Date: 02/20/2003



      java.nio.charset.CharsetEncoder.isLegalReplacement(byte[]) returns true
      with unmapable byte sequence in UTF-16, UTF-16BE and UTF-16LE. Javadoc for
      this method says:

      ...
      Tells whether or not the given byte array is a legal replacement value for this encoder.

      A replacement is legal if, and only if, it is a legal sequence of bytes in this encoder's charset; that is, it must be possible to decode the replacement into one or more sixteen-bit Unicode characters.
      ...

      RFC2781 describes process of decoding as follows:

      ...
         Decoding of a single character from UTF-16 to an ISO 10646 character
         value proceeds as follows. Let W1 be the next 16-bit integer in the
         sequence of integers representing the text. Let W2 be the (eventual)
         next integer following W1.

         1) If W1 < 0xD800 or W1 > 0xDFFF, the character value U is the value
            of W1. Terminate.

         2) Determine if W1 is between 0xD800 and 0xDBFF. If not, the sequence
            is in error and no valid character can be obtained using W1.
            Terminate.

         3) If there is no W2 (that is, the sequence ends with W1), or if W2
            is not between 0xDC00 and 0xDFFF, the sequence is in error.
            Terminate.

         4) Construct a 20-bit unsigned integer U', taking the 10 low-order
            bits of W1 as its 10 high-order bits and the 10 low-order bits of
            W2 as its 10 low-order bits.
      ...


      Here is the example:

      ------------- test.java --------------
      import java.nio.charset.*;
      import java.util.*;

      public class test {

          static Object [][] bSeqs = new Object [][] {
              {"UTF-16", new byte [] { (byte)0xd8, 0, (byte)0xdc, 0}},
              {"UTF-16BE", new byte [] { (byte)0xd8, 0, (byte)0xdc, 0}},
              {"UTF-16LE", new byte [] { 0, (byte)0xd8, 0, (byte)0xdc}}
          };


          public static void main (String[] args) {
              CharsetEncoder en = null;
              for (int i = 0; i < bSeqs.length; i++) {
                  String chrName = (String)bSeqs[i][0];
                  try {
                      en = Charset.forName(chrName).newEncoder();
                  } catch(IllegalArgumentException e) {
                      e.printStackTrace(System.out);
                      return;
                  }
       
                  byte bArray [] = (byte [])(bSeqs[i][1]);
       
                  if (en.isLegalReplacement(bArray)) {
                      System.out.println("isLegalReplacement(byte[] repl) should " +
                                         "return false with " + chrName);
                  }
              }
          }
      }


      #java -version
      java version "1.4.2-beta"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b16)
      Java HotSpot(TM) Client VM (build 1.4.2-beta-b16, mixed mode)

      #java test

      isLegalReplacement(byte[] repl) should return false with UTF-16
      isLegalReplacement(byte[] repl) should return false with UTF-16BE
      isLegalReplacement(byte[] repl) should return false with UTF-16LE

      ======================================================================

            mr Mark Reinhold
            avusunw Avu Avu (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: