Name: nt126004 Date: 08/27/2001
java version "1.4.0-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta-b65)
Java HotSpot(TM) Client VM (build 1.4.0-beta-b65, mixed mode)
DESCRIPTION OF EXPECTED VERSUS ACTUAL RESULTS
=============================================
According to the following RFC 2396 productions:
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
absoluteURI = scheme ":" ( hier_part | opaque_part )
hier_part = ( net_path | abs_path ) [ "?" query ]
net_path = "//" authority [ abs_path ]
"scheme://user@host?q" is perfectly correct and should parse:
Scheme: "scheme"
Authority: "user@host"
UserInfo: "user"
Host: "host"
Port: "-1"
Path: ""
Query: "q"
Fragment: (undefined)
But the URI creation chokes, producing an exception:
java.net.URISyntaxException:
Illegal character in authority at index 9: scheme://user@host?q
EXACT STEPS TO REPRODUCE THE PROBLEM.
=====================================
Compile and run the following JAVA source.
COMPLETE JAVA SOURCE CODE THAT DEMONSTRATES THE PROBLEM. (TEST PROGRAM)
=======================================================================
import java.net.*;
class URITest2 {
public static void main(String[] arg) {
try { new URI("scheme://user@host?q"); }
catch (URISyntaxException e) { e.printStackTrace(); }
}
}
PRODUCED OUTPUT.
================
java.net.URISyntaxException: Illegal character in authority at index 9:
scheme://user@host?q
at java.net.URI$Parser.fail(URI.java:2168)
at java.net.URI$Parser.parseAuthority(URI.java:2494)
at java.net.URI$Parser.parseHierarchical(URI.java:2414)
at java.net.URI$Parser.parse(URI.java:2371)
at java.net.URI.<init>(URI.java:413)
at URITest2.main(URITest2.java:5)
ADDITIONAL CONFIGURATION INFORMATION.
=====================================
None.
SOME OTHER PROBLEMATIC URLS
============================
Undefined Hosts are not Empty Hosts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The absolute URL: "http:path" is parsed:
Protocol: "http"
Authority: (null)
UserInfo: (null)
Host: "" <=== NO. the host is undefined.
Port: -1 <=== where -1 does not mean "default"
Path: "path"
Query: (null)
Ref: (null)
Compare with the absolute URL: http:///path, which does have
an empty host.
IPV6 references - incorrect host
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The absolute URL: "http://[a:b:c]" is parsed:
Protocol: "http"
Authority: "[a:b:c]"
UserInfo: (null)
Host: "a:b:c" <=== NO. the host is "[a:b:c]".
Port: -1
Path: "path"
Query: (null)
Ref: (null)
Because, according to RFC 2732:
host = hostname | IPv4address | IPv6reference
ipv6reference = "[" IPv6address "]"
IPV6 references - empty port is allowed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The absolute URL "http://[a:b:c]:" is correct.
but generates a java.net.MalformedURLException
when parsed.
It should parse:
Protocol: "http"
Authority: "[a:b:c]"
UserInfo: (null)
Host: "[a:b:c]"
Port: -1
Path: ""
Query: (null)
Ref: (null)
Because, according to RFC 2396:
hostport = host [ ":" port ]
port = *digit
==============================
Both URI and URL classes have many points in common. There is
even a toURL() method in URI. Roughly,
the code in URL.java, line 414:
URL(URL context, String spec, URLStreamHandler handler)
PLUS, the code in URLStreamHandler.java, line 84:
protected void parseURL(URL u, String spec, int start, int limit)
EQUAL the code in URI.java:
public URI resolve(String str)
Both URL(base, spec) and new URI(base).resolve(spec) implements the
parse algorithm of RFC 2396.
The implementation is however different. The implementation in
URL and URLStreamHandler has a long story (read: many patches,
to revision 1.103 and revision 1.48)
URI is a brand new implementation of the parse algorithm described
in RFC2396.
Why to keep both classes? The "theoretical" answer, largely
commented in the URI javadoc, tells you an URI is not an URL.
Yep. But it could have been done differently (an URL IS-A URI)
and the "real" answer is (me think) LEGACY.
I guess that, starting from scratch today, the URL class would
change (and class URL extends URI)
1) an URL being a URI would be entirely parsed like an URI;
2) being a URL, an appropriate URLStreamHandler is chosen based
on the unveiled scheme, aka protocol in URL;
3) the URLStreamHandler.parseURL() method would be called to just
"finish the parse" -- doing almost nothing.
Alas: legacy makes that a URL is parsed just enough to unveil the
scheme, then the URLStreamHandler is immediately called with
basically nothing parsed so far. Hence, anyone could write its
own URL validation, and have their own URL syntax fully violating
the URI syntax (remember: a URL is a URI.)
There is even a comment in URL:
// Note: we don't do validation of the URL here. Too risky to change
// right now, but worth considering for future reference. -br
I think that someday the parseURL method in URLStreamHandler will
be deprecated, and replaced by parseURL(URI uri), defaulting to
URI.toURL(). This design would clean the URL class so you won't
be bothered anymore and confused by customers like me!
Serioulsy: this is why I'm so picky on URI's. URL were designed
early, at a time where it was quite impossible to do differently.
URI is a new class, and we are not allowed to miss the train a
second time.
(Review ID: 130609)
======================================================================
- duplicates
-
JDK-4479463 hierarical URI parsing
-
- Closed
-