Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6559595

HTML serialization has special ampersand handling of URL containing attributes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 1.4.0
    • 6
    • xml
    • 1.4
    • x86
    • windows_xp
    • Verified

        FULL PRODUCT VERSION :
        java version "1.6.0_01"
        Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
        Java HotSpot(TM) Client VM (build 1.6.0_01-b06, mixed mode, sharing)

        ADDITIONAL OS VERSION INFORMATION :
        Microsoft Windows XP [Version 5.1.2600]

        A DESCRIPTION OF THE PROBLEM :
        It seems that the HTML serialization code, which is both used by the XSL/T
        transformer and the org.w3c.dom.ls package, has a special handling of
        attributes that contain URLs, most notably the "href" attribute in the anchor
        ("a") element. Ampersand characters occurring in the URLs are not escaped,
        but are inserted literally. While this is problably done for backward
        compatibility reasons with (very) old browsers, this violates the HTML
        specifications for a long time (e.g. HTML 4 [1] or HTML 2 [2], which was
        published 1995). The wrong use of the ampersand causes modern,
        compliant browsers to misinterpret those generated URLs, directing
        the users to non-existing pages or sending broken form contents
        to web servers. For other attributes (e.g. the "title" attribute in the given),
        the ampersand gets correctly replaced by a respective character entity.

        [1] http://www.w3.org/TR/html401/appendix/notes.html section B.2.2
        [2] http://www.ietf.org/rfc/rfc1866.txt section 8.2.1 (page 46)



        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        Compile & run the test case.



        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        <html>
        <body>
        <a href="http://example.com/bla?x&amp;y" title="http://example.com/bla?x&amp;y">test
        </a>
        </body>
        </html>
        ACTUAL -
        <html>
        <body>
        <a href="http://example.com/bla?x&y" title="http://example.com/bla?x&amp;y">test
        </a>
        </body>
        </html>

        REPRODUCIBILITY :
        This bug can be reproduced always.

        ---------- BEGIN SOURCE ----------
        File "Test.java":

        import java.io.StringReader;
        import javax.xml.transform.OutputKeys;
        import javax.xml.transform.Result;
        import javax.xml.transform.Source;
        import javax.xml.transform.Transformer;
        import javax.xml.transform.TransformerFactory;
        import javax.xml.transform.stream.StreamResult;
        import javax.xml.transform.stream.StreamSource;


        public class Test
        {
          public static void main (String[] args)
            throws Exception
          
          { String xml = "<html><body><a href='http://example.com/bla?x&amp;y&#39; " +
              "title='http://example.com/bla?x&amp;y&#39;&gt;test&lt;/a&gt;&lt;/body&gt;&lt;/html>";
            
            Source src = new StreamSource (new StringReader (xml));
            Result res = new StreamResult (System.out);
            TransformerFactory tf = TransformerFactory.newInstance ();
            Transformer t = tf.newTransformer ();
            
            t.setOutputProperty (OutputKeys.METHOD, "html");
            t.transform (src, res);
          }
        }
        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        None known.

              spericas Santiago Pericasgeertsen
              ryeung Roger Yeung (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: