-
Bug
-
Resolution: Unresolved
-
P3
-
8u40, 8u92, 9, 11, 12
-
x86_64
-
linux
FULL PRODUCT VERSION :
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux tharbad 3.18.9-200.fc21.x86_64 #1 SMP Mon Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
When encoding non-BMP characters, ParseUtil seems to assume ucs-2 instead of utf-16, causing the resulting utf-8 to be invalid. Apparently Java 7 was able to consume and reconstruct the correct utf-16 sequence when decoding this, but in Java 8 this results in the following exception:
Welcome to JavaREPL version dev.build (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_40)
Type expression to evaluate, :help for more options or press tab to auto-complete.
java> import sun.net.www.ParseUtil;
Imported sun.net.www.ParseUtil
java> ParseUtil.decode(ParseUtil.encodePath(new String(Character.toChars(0x1F631))));
java.lang.IllegalArgumentException: Error decoding percent encoded characters
java> ParseUtil.encodePath(new String(Character.toChars(0x1F631)));
java.lang.String res0 = "%ed%a0%bd%ed%b8%b1"
The correct utf-8 sequence for U+1F631 is 0xF0 0x9F 0x98 0xB1.
ParseUtil.encodePath is used by sun.misc.URLClassPath, in turn used by java.net.URLClassLoader.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
ParseUtil.encodePath(new String(Character.toChars(0x1F631)));
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
"%f0%9f%98%b1"
ACTUAL -
"%ed%a0%bd%ed%b8%b1"
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import sun.net.www.ParseUtil;
public class Bug {
public static void main(String args[]) throws Exception {
final String emoji = new String(Character.toChars(0x1F631));
final String encoded = ParseUtil.encodePath(emoji);
System.out.println(encoded);
final String decoded = ParseUtil.decode(encoded);
}
}
---------- END SOURCE ----------
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux tharbad 3.18.9-200.fc21.x86_64 #1 SMP Mon Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
When encoding non-BMP characters, ParseUtil seems to assume ucs-2 instead of utf-16, causing the resulting utf-8 to be invalid. Apparently Java 7 was able to consume and reconstruct the correct utf-16 sequence when decoding this, but in Java 8 this results in the following exception:
Welcome to JavaREPL version dev.build (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_40)
Type expression to evaluate, :help for more options or press tab to auto-complete.
java> import sun.net.www.ParseUtil;
Imported sun.net.www.ParseUtil
java> ParseUtil.decode(ParseUtil.encodePath(new String(Character.toChars(0x1F631))));
java.lang.IllegalArgumentException: Error decoding percent encoded characters
java> ParseUtil.encodePath(new String(Character.toChars(0x1F631)));
java.lang.String res0 = "%ed%a0%bd%ed%b8%b1"
The correct utf-8 sequence for U+1F631 is 0xF0 0x9F 0x98 0xB1.
ParseUtil.encodePath is used by sun.misc.URLClassPath, in turn used by java.net.URLClassLoader.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
ParseUtil.encodePath(new String(Character.toChars(0x1F631)));
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
"%f0%9f%98%b1"
ACTUAL -
"%ed%a0%bd%ed%b8%b1"
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import sun.net.www.ParseUtil;
public class Bug {
public static void main(String args[]) throws Exception {
final String emoji = new String(Character.toChars(0x1F631));
final String encoded = ParseUtil.encodePath(emoji);
System.out.println(encoded);
final String decoded = ParseUtil.decode(encoded);
}
}
---------- END SOURCE ----------