Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P4
Fix Version/s: 1.4.0
Affects Version/s: 1.4.0
Component/s: core-libs
Labels:
- JCK-exclude
- URLDecoder
- UTF-8
- decode
- net

Subcomponent:
java.net
Resolved In Build:
beta
CPU:

sparc
OS:

solaris_2.6

Name: dfR10049 Date: 12/09/2000

Javadoc for java.net.URLDecode.decode(String s) states:

    Decodes a x-www-form-urlencoded string. UTF-8 is used to determine what
    characters are represented by any consecutive sequences of the form "%xy".

The string "\uD800\uDC00" is a surrogate pair and encoded as "%3F%ED%B0%80"
(by URLEncoder.encode(s) method) according to UTF-8. But "%3F%ED%B0%80" is
not decoded as "\uD800\uDC00". This behavior is not correct.

Please see an example demonstrating the bug below:
----------- DecoderTest.java ----------------
import java.net.*;

public class DecoderTest {

   public static void main(String args[]) {

        String surrogatePair = "\uD800\uDC00";
               // a string of surrogate pairs can be expressed as 4 bytes (UTF-8)

        String encoded = URLEncoder.encode(surrogatePair);
        String decoded = URLDecoder.decode(encoded);

        System.out.println("encoded: " + encoded);

        System.out.println("decoded: " + decoded);
        System.out.print (" ");
        printBytes(decoded);
        System.out.println("surrogatePair: " + surrogatePair);
        System.out.print (" ");
        printBytes(surrogatePair);

        if (encoded.equals(decoded))
            System.out.println("Test passed");
        else
            System.out.println("Test failed");
   };

   static void printBytes(String s) {
       try {
           byte[] arr = s.getBytes("UTF-8");
           for (int i = 0; i < arr.length; i++) {
               char ch1 = Character.forDigit((arr[i] >> 4) & 0xF, 16);
               char ch2 = Character.forDigit(arr[i] & 0xF, 16);
               System.out.print(ch1 + "" + ch2 + " ");
           }
           System.out.println();
       } catch (java.io.UnsupportedEncodingException e) {
       }

   }
}
#----------------- output from the test ----------------------
encoded: %3F%ED%B0%80
decoded: ??
         3f ed b0 80
surrogatePair: ?
         f0 90 80 80
Test failed
#-------------------------------------------------------------

======================================================================

Assignee:: Mayank Upadhyay (Inactive)

Reporter:: Fda Fda (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2000-12-09 04:49

Updated:: 2001-01-16 11:08

Resolved:: 2001-01-16 11:08

Imported:: 16/Sep/12 12:23 AM

Indexed:: 17/Jul/12 8:31 PM

Details

Description

Attachments

Activity

People

Dates