-
Bug
-
Resolution: Not an Issue
-
P4
-
None
-
8, 9
-
generic
-
generic
FULL PRODUCT VERSION :
A DESCRIPTION OF THE PROBLEM :
According to the reference docs for this URI constructor: https://docs.oracle.com/javase/8/docs/api/java/net/URI.html#URI-java.lang.String-java.lang.String-java.lang.String-java.lang.String-java.lang.String-
"If a path is given then it is appended. Any character not in the unreserved, punct, escaped, or other categories, and not equal to the slash character ('/') or the commercial-at character ('@'), is quoted."
An escaped character is defined further up on the page as:
"Escaped octets, that is, triplets consisting of the percent character ('%') followed by two hexadecimal digits ('0'-'9', 'A'-'F', and 'a'-'f')"
論 is in the other category, so it doesn't surprise me this isn't encoded if I use it in the path directly. However if I encode it as %EF%A5%81, this should be considered an 'escaped' character. And, according to the docs, it should not be quoted. However, per the reproduction steps below, it is.
As it stands, I see no way to use URI with these sorts of characters in the path. If I don't encode, they don't get quoted and fail over the wire. If I do, they get double encoded which results in an incorrect value. Furthermore, the only way to pre-encode I even see is URLEncoder which is specifically documented not to be for this usage.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
System.out.println(new URI(null, null, "論", null, null).getPath());
System.out.println(new URI(null, null, "論", null, null).getRawPath());
System.out.println(new URI(null, null, URLEncoder.encode("論", Constants.UTF8_CHARSET), null, null).getPath());
System.out.println(new URI(null, null, URLEncoder.encode("論", Constants.UTF8_CHARSET), null, null).getRawPath());
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
論
論
論
%EF%A5%81
ACTUAL -
論
論
%EF%A5%81
%25EF%25A5%2581
REPRODUCIBILITY :
This bug can be reproduced always.
A DESCRIPTION OF THE PROBLEM :
According to the reference docs for this URI constructor: https://docs.oracle.com/javase/8/docs/api/java/net/URI.html#URI-java.lang.String-java.lang.String-java.lang.String-java.lang.String-java.lang.String-
"If a path is given then it is appended. Any character not in the unreserved, punct, escaped, or other categories, and not equal to the slash character ('/') or the commercial-at character ('@'), is quoted."
An escaped character is defined further up on the page as:
"Escaped octets, that is, triplets consisting of the percent character ('%') followed by two hexadecimal digits ('0'-'9', 'A'-'F', and 'a'-'f')"
論 is in the other category, so it doesn't surprise me this isn't encoded if I use it in the path directly. However if I encode it as %EF%A5%81, this should be considered an 'escaped' character. And, according to the docs, it should not be quoted. However, per the reproduction steps below, it is.
As it stands, I see no way to use URI with these sorts of characters in the path. If I don't encode, they don't get quoted and fail over the wire. If I do, they get double encoded which results in an incorrect value. Furthermore, the only way to pre-encode I even see is URLEncoder which is specifically documented not to be for this usage.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
System.out.println(new URI(null, null, "論", null, null).getPath());
System.out.println(new URI(null, null, "論", null, null).getRawPath());
System.out.println(new URI(null, null, URLEncoder.encode("論", Constants.UTF8_CHARSET), null, null).getPath());
System.out.println(new URI(null, null, URLEncoder.encode("論", Constants.UTF8_CHARSET), null, null).getRawPath());
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
論
論
論
%EF%A5%81
ACTUAL -
論
論
%EF%A5%81
%25EF%25A5%2581
REPRODUCIBILITY :
This bug can be reproduced always.
- relates to
-
JDK-8274943 URI constructor does not encode path correctly
-
- Closed
-