Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2153482 | 7 | Joe Wang | P4 | Closed | Fixed | b15 |
JDK-2182517 | 6u18 | Santiago Pericasgeertsen | P4 | Resolved | Fixed | b02 |
FULL PRODUCT VERSION :
java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
Java HotSpot(TM) Client VM (build 1.6.0_01-b06, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]
A DESCRIPTION OF THE PROBLEM :
It seems that the HTML serialization code, which is both used by the XSL/T
transformer and the org.w3c.dom.ls package, has a special handling of
attributes that contain URLs, most notably the "href" attribute in the anchor
("a") element. Ampersand characters occurring in the URLs are not escaped,
but are inserted literally. While this is problably done for backward
compatibility reasons with (very) old browsers, this violates the HTML
specifications for a long time (e.g. HTML 4 [1] or HTML 2 [2], which was
published 1995). The wrong use of the ampersand causes modern,
compliant browsers to misinterpret those generated URLs, directing
the users to non-existing pages or sending broken form contents
to web servers. For other attributes (e.g. the "title" attribute in the given),
the ampersand gets correctly replaced by a respective character entity.
[1] http://www.w3.org/TR/html401/appendix/notes.html section B.2.2
[2] http://www.ietf.org/rfc/rfc1866.txt section 8.2.1 (page 46)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile & run the test case.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
<html>
<body>
<a href="http://example.com/bla?x&y" title="http://example.com/bla?x&y">test
</a>
</body>
</html>
ACTUAL -
<html>
<body>
<a href="http://example.com/bla?x&y" title="http://example.com/bla?x&y">test
</a>
</body>
</html>
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
File "Test.java":
import java.io.StringReader;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class Test
{
public static void main (String[] args)
throws Exception
{ String xml = "<html><body><a href='http://example.com/bla?x&y' " +
"title='http://example.com/bla?x&y'>test</a></body></html>";
Source src = new StreamSource (new StringReader (xml));
Result res = new StreamResult (System.out);
TransformerFactory tf = TransformerFactory.newInstance ();
Transformer t = tf.newTransformer ();
t.setOutputProperty (OutputKeys.METHOD, "html");
t.transform (src, res);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None known.
java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
Java HotSpot(TM) Client VM (build 1.6.0_01-b06, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]
A DESCRIPTION OF THE PROBLEM :
It seems that the HTML serialization code, which is both used by the XSL/T
transformer and the org.w3c.dom.ls package, has a special handling of
attributes that contain URLs, most notably the "href" attribute in the anchor
("a") element. Ampersand characters occurring in the URLs are not escaped,
but are inserted literally. While this is problably done for backward
compatibility reasons with (very) old browsers, this violates the HTML
specifications for a long time (e.g. HTML 4 [1] or HTML 2 [2], which was
published 1995). The wrong use of the ampersand causes modern,
compliant browsers to misinterpret those generated URLs, directing
the users to non-existing pages or sending broken form contents
to web servers. For other attributes (e.g. the "title" attribute in the given),
the ampersand gets correctly replaced by a respective character entity.
[1] http://www.w3.org/TR/html401/appendix/notes.html section B.2.2
[2] http://www.ietf.org/rfc/rfc1866.txt section 8.2.1 (page 46)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile & run the test case.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
<html>
<body>
<a href="http://example.com/bla?x&y" title="http://example.com/bla?x&y">test
</a>
</body>
</html>
ACTUAL -
<html>
<body>
<a href="http://example.com/bla?x&y" title="http://example.com/bla?x&y">test
</a>
</body>
</html>
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
File "Test.java":
import java.io.StringReader;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class Test
{
public static void main (String[] args)
throws Exception
{ String xml = "<html><body><a href='http://example.com/bla?x&y' " +
"title='http://example.com/bla?x&y'>test</a></body></html>";
Source src = new StreamSource (new StringReader (xml));
Result res = new StreamResult (System.out);
TransformerFactory tf = TransformerFactory.newInstance ();
Transformer t = tf.newTransformer ();
t.setOutputProperty (OutputKeys.METHOD, "html");
t.transform (src, res);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None known.
- backported by
-
JDK-2182517 HTML serialization has special ampersand handling of URL containing attributes
-
- Resolved
-
-
JDK-2153482 HTML serialization has special ampersand handling of URL containing attributes
-
- Closed
-