FULL PRODUCT VERSION :
java version "1.7.0_10"
Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux ip-10-69-17-45 3.2.0-67-virtual #101-Ubuntu SMP Tue Jul 15 17:58:37 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
When I use URI(String scheme, String ssp, String fragment) constructor, it encodes invalid characters, so for the following parameters:
URI u = URI("custom", "a [b c] d", null)
I get URI: custom:a%20[b%20c]%20d
As you can see, '[' character is not encoded, so when I call getSchemeSpecificPart() method, it returns value, different from what was provided in constructor:
u.getSchemeSpecificPart() gives "a [b%20c] d"
JDK documentation states that the following statement should always hold:
new URI(u.getScheme(),
u.getSchemeSpecificPart(),
u.getFragment())
.equals(u)
but it gives 'false' for our URI.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Execute the following code:
final String ssp = "a [b c] d";
final URI u = new URI("custom", ssp, null);
System.err.println("ssp: " + ssp);
System.err.println("URI: " + u);
System.err.println("URI.host: " + u.getHost());
System.err.println("URI.path: " + u.getPath());
System.err.println("URI.ssp: " + u.getSchemeSpecificPart());
final boolean b =
new URI(u.getScheme(),
u.getSchemeSpecificPart(),
u.getFragment())
.equals(u);
System.err.println("MUST hold: " + b);
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I expect to see either the following output:
path: a [b c] d
URI: custom:a%20[b%20c]%20d
URI.host: null
URI.path: null
URI.ssp: a [b c] d
MUST hold: true
Since JDK states that for scheme-specific part "any character that is not a legal URI character is quoted".
When we look at RFC2396 Appendix A (https://tools.ietf.org/html/rfc2396#appendix-A) and keeping in mind changes from RFC2732 (https://tools.ietf.org/html/rfc2732), we can see the following:
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
absoluteURI = scheme ":" ( hier_part | opaque_part )
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
So our URI is <absoluteURI>.
hier_part = ( net_path | abs_path ) [ "?" query ]
opaque_part = uric_no_slash *uric
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
"&" | "=" | "+" | "$" | ","
net_path = "//" authority [ abs_path ]
abs_path = "/" path_segments
So our absolute URI does not have a <hier_part> but has an <opaque_part>, that consists of the first character <uric_no_slash> and all the following characters <uric>:
uric = reserved | unreserved | escaped
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | "," | "[" | "]"
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
"(" | ")"
escaped = "%" hex hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
alphanum = alpha | digit
alpha = lowalpha | upalpha
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"
So when we create our URI, all illegal charactes in scheme-specific part should be encoded. In our example only spaces are illegal, so we should encode them. So we should have the expected behaviour as described above.
But RFC2396 also has the following rules:
path = [ abs_path | opaque_part ]
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pchar
pchar = unreserved | escaped |
":" | "@" | "&" | "=" | "+" | "$" | ","
These rules tells us that <opaque_part> MUST consist only from <pchar> characters, which do not allow use of spaces and '[' and ']' characters.
So according to these rules we should get another behaviour:
path: a [b c] d
URI: custom:a%20%5bb%20c%5d%20d
URI.host: null
URI.path: null
URI.ssp: a [b c] d
MUST hold: true
But anyway, we always should get 'getSchemeSpecificPart()' method result the same as the value, provided into constructor.
ACTUAL -
I get the following output:
ssp: a [b c] d
URI: custom:a%20[b%20c]%20d
URI.host: null
URI.path: null
URI.ssp: a [b%20c] d
MUST hold: false
The value returned from 'getSchemeSpecificPart()' method is not the same as the value, provided into constructor.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.net.*;
class M
{
public static void main (String[] args) throws java.lang.Exception
{
try {
final String ssp = "a [b c] d";
final URI u = new URI("custom", ssp, null);
System.err.println("ssp: " + ssp);
System.err.println("URI: " + u);
System.err.println("URI.host: " + u.getHost());
System.err.println("URI.path: " + u.getPath());
System.err.println("URI.ssp: " + u.getSchemeSpecificPart());
final boolean b =
new URI(u.getScheme(),
u.getSchemeSpecificPart(),
u.getFragment())
.equals(u);
System.err.println("MUST hold: " + b);
} catch (URISyntaxException e) {
e.printStackTrace();
}
}
}
---------- END SOURCE ----------
java version "1.7.0_10"
Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux ip-10-69-17-45 3.2.0-67-virtual #101-Ubuntu SMP Tue Jul 15 17:58:37 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
When I use URI(String scheme, String ssp, String fragment) constructor, it encodes invalid characters, so for the following parameters:
URI u = URI("custom", "a [b c] d", null)
I get URI: custom:a%20[b%20c]%20d
As you can see, '[' character is not encoded, so when I call getSchemeSpecificPart() method, it returns value, different from what was provided in constructor:
u.getSchemeSpecificPart() gives "a [b%20c] d"
JDK documentation states that the following statement should always hold:
new URI(u.getScheme(),
u.getSchemeSpecificPart(),
u.getFragment())
.equals(u)
but it gives 'false' for our URI.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Execute the following code:
final String ssp = "a [b c] d";
final URI u = new URI("custom", ssp, null);
System.err.println("ssp: " + ssp);
System.err.println("URI: " + u);
System.err.println("URI.host: " + u.getHost());
System.err.println("URI.path: " + u.getPath());
System.err.println("URI.ssp: " + u.getSchemeSpecificPart());
final boolean b =
new URI(u.getScheme(),
u.getSchemeSpecificPart(),
u.getFragment())
.equals(u);
System.err.println("MUST hold: " + b);
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I expect to see either the following output:
path: a [b c] d
URI: custom:a%20[b%20c]%20d
URI.host: null
URI.path: null
URI.ssp: a [b c] d
MUST hold: true
Since JDK states that for scheme-specific part "any character that is not a legal URI character is quoted".
When we look at RFC2396 Appendix A (https://tools.ietf.org/html/rfc2396#appendix-A) and keeping in mind changes from RFC2732 (https://tools.ietf.org/html/rfc2732), we can see the following:
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
absoluteURI = scheme ":" ( hier_part | opaque_part )
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
So our URI is <absoluteURI>.
hier_part = ( net_path | abs_path ) [ "?" query ]
opaque_part = uric_no_slash *uric
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
"&" | "=" | "+" | "$" | ","
net_path = "//" authority [ abs_path ]
abs_path = "/" path_segments
So our absolute URI does not have a <hier_part> but has an <opaque_part>, that consists of the first character <uric_no_slash> and all the following characters <uric>:
uric = reserved | unreserved | escaped
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | "," | "[" | "]"
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
"(" | ")"
escaped = "%" hex hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
alphanum = alpha | digit
alpha = lowalpha | upalpha
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"
So when we create our URI, all illegal charactes in scheme-specific part should be encoded. In our example only spaces are illegal, so we should encode them. So we should have the expected behaviour as described above.
But RFC2396 also has the following rules:
path = [ abs_path | opaque_part ]
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pchar
pchar = unreserved | escaped |
":" | "@" | "&" | "=" | "+" | "$" | ","
These rules tells us that <opaque_part> MUST consist only from <pchar> characters, which do not allow use of spaces and '[' and ']' characters.
So according to these rules we should get another behaviour:
path: a [b c] d
URI: custom:a%20%5bb%20c%5d%20d
URI.host: null
URI.path: null
URI.ssp: a [b c] d
MUST hold: true
But anyway, we always should get 'getSchemeSpecificPart()' method result the same as the value, provided into constructor.
ACTUAL -
I get the following output:
ssp: a [b c] d
URI: custom:a%20[b%20c]%20d
URI.host: null
URI.path: null
URI.ssp: a [b%20c] d
MUST hold: false
The value returned from 'getSchemeSpecificPart()' method is not the same as the value, provided into constructor.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.net.*;
class M
{
public static void main (String[] args) throws java.lang.Exception
{
try {
final String ssp = "a [b c] d";
final URI u = new URI("custom", ssp, null);
System.err.println("ssp: " + ssp);
System.err.println("URI: " + u);
System.err.println("URI.host: " + u.getHost());
System.err.println("URI.path: " + u.getPath());
System.err.println("URI.ssp: " + u.getSchemeSpecificPart());
final boolean b =
new URI(u.getScheme(),
u.getSchemeSpecificPart(),
u.getFragment())
.equals(u);
System.err.println("MUST hold: " + b);
} catch (URISyntaxException e) {
e.printStackTrace();
}
}
}
---------- END SOURCE ----------
- duplicates
-
JDK-8184958 URI getSchemeSpecificPart() does not decode properly
-
- Closed
-
-
JDK-8054024 URI constructor violates guaranteed identity for URI with [brackets] in path
-
- Closed
-
-
JDK-8067436 URI (multiparam) constructor throws URISyntaxException when path contains []
-
- Closed
-
- relates to
-
JDK-8054024 URI constructor violates guaranteed identity for URI with [brackets] in path
-
- Closed
-
-
JDK-8067436 URI (multiparam) constructor throws URISyntaxException when path contains []
-
- Closed
-