Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: 5.0
Affects Version/s: 1.4.0, 1.4.2, 5.0
Component/s: core-libs
Labels:
- Red
- charset
- nio
- tiger-jck
- tiger-low

Subcomponent:
java.nio.charsets
Resolved In Build:
b32
CPU:

generic, sparc
OS:

generic, solaris_8
Verification:
Verified

Name: auR10023 Date: 02/20/2003

java.nio.charset.CharsetEncoder.isLegalReplacement(byte[]) returns true
with unmapable byte sequence in UTF-16, UTF-16BE and UTF-16LE. Javadoc for
this method says:

...
Tells whether or not the given byte array is a legal replacement value for this encoder.

A replacement is legal if, and only if, it is a legal sequence of bytes in this encoder's charset; that is, it must be possible to decode the replacement into one or more sixteen-bit Unicode characters.
...

RFC2781 describes process of decoding as follows:

...
   Decoding of a single character from UTF-16 to an ISO 10646 character
   value proceeds as follows. Let W1 be the next 16-bit integer in the
   sequence of integers representing the text. Let W2 be the (eventual)
   next integer following W1.

   1) If W1 < 0xD800 or W1 > 0xDFFF, the character value U is the value
      of W1. Terminate.

   2) Determine if W1 is between 0xD800 and 0xDBFF. If not, the sequence
      is in error and no valid character can be obtained using W1.
      Terminate.

   3) If there is no W2 (that is, the sequence ends with W1), or if W2
      is not between 0xDC00 and 0xDFFF, the sequence is in error.
      Terminate.

   4) Construct a 20-bit unsigned integer U', taking the 10 low-order
      bits of W1 as its 10 high-order bits and the 10 low-order bits of
      W2 as its 10 low-order bits.
...

Here is the example:

------------- test.java --------------
import java.nio.charset.*;
import java.util.*;

public class test {

    static Object [][] bSeqs = new Object [][] {
        {"UTF-16", new byte [] { (byte)0xd8, 0, (byte)0xdc, 0}},
        {"UTF-16BE", new byte [] { (byte)0xd8, 0, (byte)0xdc, 0}},
        {"UTF-16LE", new byte [] { 0, (byte)0xd8, 0, (byte)0xdc}}
    };

    public static void main (String[] args) {
        CharsetEncoder en = null;
        for (int i = 0; i < bSeqs.length; i++) {
            String chrName = (String)bSeqs[i][0];
            try {
                en = Charset.forName(chrName).newEncoder();
            } catch(IllegalArgumentException e) {
                e.printStackTrace(System.out);
                return;
            }

            byte bArray [] = (byte [])(bSeqs[i][1]);

            if (en.isLegalReplacement(bArray)) {
                System.out.println("isLegalReplacement(byte[] repl) should " +
                                   "return false with " + chrName);
            }
        }
    }
}

#java -version
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b16)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b16, mixed mode)

#java test

isLegalReplacement(byte[] repl) should return false with UTF-16
isLegalReplacement(byte[] repl) should return false with UTF-16BE
isLegalReplacement(byte[] repl) should return false with UTF-16LE

======================================================================

Assignee:: Mark Reinhold

Reporter:: Avu Avu (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2003-02-20 07:06

Updated:: 2004-09-01 01:55

Resolved:: 2003-12-19 10:48

Imported:: 17/Sep/12 8:09 PM

Indexed:: 28/Jul/12 3:53 AM

Details

Description

Attachments

Activity

People

Dates