-
Bug
-
Resolution: Fixed
-
P4
-
1.4.0
-
beta
-
sparc
-
solaris_2.6
Name: dfR10049 Date: 12/09/2000
Javadoc for java.net.URLDecode.decode(String s) states:
Decodes a x-www-form-urlencoded string. UTF-8 is used to determine what
characters are represented by any consecutive sequences of the form "%xy".
The string "\uD800\uDC00" is a surrogate pair and encoded as "%3F%ED%B0%80"
(by URLEncoder.encode(s) method) according to UTF-8. But "%3F%ED%B0%80" is
not decoded as "\uD800\uDC00". This behavior is not correct.
Please see an example demonstrating the bug below:
----------- DecoderTest.java ----------------
import java.net.*;
public class DecoderTest {
public static void main(String args[]) {
String surrogatePair = "\uD800\uDC00";
// a string of surrogate pairs can be expressed as 4 bytes (UTF-8)
String encoded = URLEncoder.encode(surrogatePair);
String decoded = URLDecoder.decode(encoded);
System.out.println("encoded: " + encoded);
System.out.println("decoded: " + decoded);
System.out.print (" ");
printBytes(decoded);
System.out.println("surrogatePair: " + surrogatePair);
System.out.print (" ");
printBytes(surrogatePair);
if (encoded.equals(decoded))
System.out.println("Test passed");
else
System.out.println("Test failed");
};
static void printBytes(String s) {
try {
byte[] arr = s.getBytes("UTF-8");
for (int i = 0; i < arr.length; i++) {
char ch1 = Character.forDigit((arr[i] >> 4) & 0xF, 16);
char ch2 = Character.forDigit(arr[i] & 0xF, 16);
System.out.print(ch1 + "" + ch2 + " ");
}
System.out.println();
} catch (java.io.UnsupportedEncodingException e) {
}
}
}
#----------------- output from the test ----------------------
encoded: %3F%ED%B0%80
decoded: ??
3f ed b0 80
surrogatePair: ?
f0 90 80 80
Test failed
#-------------------------------------------------------------
======================================================================