Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4396708

java.net.URLDecode.decode(st) does not decode encoded surrogate pairs correctly

    XMLWordPrintable

Details

    • beta
    • sparc
    • solaris_2.6

    Description



      Name: dfR10049 Date: 12/09/2000



      Javadoc for java.net.URLDecode.decode(String s) states:

          Decodes a x-www-form-urlencoded string. UTF-8 is used to determine what
          characters are represented by any consecutive sequences of the form "%xy".

      The string "\uD800\uDC00" is a surrogate pair and encoded as "%3F%ED%B0%80"
      (by URLEncoder.encode(s) method) according to UTF-8. But "%3F%ED%B0%80" is
      not decoded as "\uD800\uDC00". This behavior is not correct.

      Please see an example demonstrating the bug below:
      ----------- DecoderTest.java ----------------
      import java.net.*;

      public class DecoderTest {

         public static void main(String args[]) {

              String surrogatePair = "\uD800\uDC00";
                     // a string of surrogate pairs can be expressed as 4 bytes (UTF-8)

              String encoded = URLEncoder.encode(surrogatePair);
              String decoded = URLDecoder.decode(encoded);

              System.out.println("encoded: " + encoded);

              System.out.println("decoded: " + decoded);
              System.out.print (" ");
              printBytes(decoded);
              System.out.println("surrogatePair: " + surrogatePair);
              System.out.print (" ");
              printBytes(surrogatePair);


              if (encoded.equals(decoded))
                  System.out.println("Test passed");
              else
                  System.out.println("Test failed");
         };

         static void printBytes(String s) {
             try {
                 byte[] arr = s.getBytes("UTF-8");
                 for (int i = 0; i < arr.length; i++) {
                     char ch1 = Character.forDigit((arr[i] >> 4) & 0xF, 16);
                     char ch2 = Character.forDigit(arr[i] & 0xF, 16);
                     System.out.print(ch1 + "" + ch2 + " ");
                 }
                 System.out.println();
             } catch (java.io.UnsupportedEncodingException e) {
             }

         }
      }
      #----------------- output from the test ----------------------
      encoded: %3F%ED%B0%80
      decoded: ??
               3f ed b0 80
      surrogatePair: ?
               f0 90 80 80
      Test failed
      #-------------------------------------------------------------

      ======================================================================

      Attachments

        Activity

          People

            mupadhyasunw Mayank Upadhyay (Inactive)
            fdasunw Fda Fda (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: