Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: P4
Fix Version/s: None
Affects Version/s: 1.4.2
Component/s: core-libs
Labels:
- CAP

Subcomponent:
java.net
CPU:

generic
OS:

generic

Although java.net.URL class is never explicit about how it handles encoding, there is an implicit assumption that all input to URL constructors should be in encoded form, i.e. the w3c recommended encoding format. See http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars. We should specify the expected encoding for URL in a future release.

Quoting the relevant paragraphs:

"B.2.1 Non-ASCII characters in URI attribute values

Although URIs do not contain non-ASCII values (see [URI], section 2.1) authors sometimes specify them in attribute values expecting URIs (i.e., defined with %URI; in the DTD). For instance, the following href value is illegal:

<A href="http://foo.org/H?kon">...</A>

We recommend that user agents adopt the following convention for handling non-ASCII characters in such cases:

1. Represent each character in UTF-8 (see [RFC2279]) as one or more bytes.
2. Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).
"

The URI class specifies the w3c recommended practice as the expected input to it's constructors etc, and is explict about what the expected output should be: the distinction between getRawXXX and getXXX methods.

duplicates

JDK-4148751 URL.sameFile return false on URL's, that are equal modulo url-encoding

Resolved

Assignee:: Yingxian Wang (Inactive)

Reporter:: Yingxian Wang (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2002-12-12 14:13

Updated:: 2003-10-03 15:03

Resolved:: 2003-10-03 15:03

Imported:: 16/Sep/12 12:32 AM

Indexed:: 17/Jul/12 8:41 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates