Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4793105

The expected encoding for java.net.URL needs clarification

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 1.4.2
    • core-libs
    • generic
    • generic

      Although java.net.URL class is never explicit about how it handles encoding, there is an implicit assumption that all input to URL constructors should be in encoded form, i.e. the w3c recommended encoding format. See http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars. We should specify the expected encoding for URL in a future release.

      Quoting the relevant paragraphs:

      "B.2.1 Non-ASCII characters in URI attribute values

      Although URIs do not contain non-ASCII values (see [URI], section 2.1) authors sometimes specify them in attribute values expecting URIs (i.e., defined with %URI; in the DTD). For instance, the following href value is illegal:

      <A href="http://foo.org/H?kon">...</A>

      We recommend that user agents adopt the following convention for handling non-ASCII characters in such cases:

         1. Represent each character in UTF-8 (see [RFC2279]) as one or more bytes.
         2. Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).
      "

      The URI class specifies the w3c recommended practice as the expected input to it's constructors etc, and is explict about what the expected output should be: the distinction between getRawXXX and getXXX methods.

            ywangsunw Yingxian Wang (Inactive)
            ywangsunw Yingxian Wang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: