Although java.net.URL class is never explicit about how it handles encoding, there is an implicit assumption that all input to URL constructors should be in encoded form, i.e. the w3c recommended encoding format. See http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars. We should specify the expected encoding for URL in a future release.
Quoting the relevant paragraphs:
"B.2.1 Non-ASCII characters in URI attribute values
Although URIs do not contain non-ASCII values (see [URI], section 2.1) authors sometimes specify them in attribute values expecting URIs (i.e., defined with %URI; in the DTD). For instance, the following href value is illegal:
<A href="http://foo.org/H?kon">...</A>
We recommend that user agents adopt the following convention for handling non-ASCII characters in such cases:
1. Represent each character in UTF-8 (see [RFC2279]) as one or more bytes.
2. Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).
"
The URI class specifies the w3c recommended practice as the expected input to it's constructors etc, and is explict about what the expected output should be: the distinction between getRawXXX and getXXX methods.
Quoting the relevant paragraphs:
"B.2.1 Non-ASCII characters in URI attribute values
Although URIs do not contain non-ASCII values (see [URI], section 2.1) authors sometimes specify them in attribute values expecting URIs (i.e., defined with %URI; in the DTD). For instance, the following href value is illegal:
<A href="http://foo.org/H?kon">...</A>
We recommend that user agents adopt the following convention for handling non-ASCII characters in such cases:
1. Represent each character in UTF-8 (see [RFC2279]) as one or more bytes.
2. Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).
"
The URI class specifies the w3c recommended practice as the expected input to it's constructors etc, and is explict about what the expected output should be: the distinction between getRawXXX and getXXX methods.
- duplicates
-
JDK-4148751 URL.sameFile return false on URL's, that are equal modulo url-encoding
-
- Resolved
-