Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-5080652

REGRESSION: REG: HTML parser case-sens HTML 'class' attrib to lcase

XMLWordPrintable

    • b11
    • x86
    • windows_2000, windows_xp

      Name: js151677 Date: 07/30/2004


      FULL PRODUCT VERSION :
      java version "1.5.0-beta2"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta2-b51)
      Java HotSpot(TM) Client VM (build 1.5.0-beta2-b51, mixed mode, sharing)

      A DESCRIPTION OF THE PROBLEM :
      Swing's HTML parser javax.swing.text.html.parser.Parser incorrectly converts the attribute values for the HTML "class" attributes to lower case. This is new and incorrect behavior with 1.5.0.

      The HTML "class" attribute is case-sensitive, so this conversion to lower case is clearly incorrect. See

      http://www.w3.org/TR/html401/struct/global.html#adef-class

      (section 7.5.2 of the HTML 4.01 spec)

      for the definitive reference that states that "class" is case sensitive. (it's what the "[CS]" means in the spec).

      The code that was introduced in 1.5.0 that causes this is in javax/swing/text/html/parser/Parser.java. Look at method parseAttributeSpecificationList(Element elem). There's a new (to 1.5.0) fragment near the end of this method:

      if (attkey == HTML.Attribute.CLASS) {
      attvalue = attvalue.toLowerCase();
      }

      So, it looks like this was intentional (but why???)

      This is causing us grief in our application (which uses the HTML parser as part of some dynamic HTML generation), as much of the CSS style matching is based on class names, and the matching in the HTML renderer (Internet Explorer or whatever) is case sensitive, as it should be based on the HTML specs. We have lots of already-developed HTML and HTML fragments and associated stylesheets which makes the workaround (using only lower case HTML class attribute values) impractical.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        To demonstrate that this behavior has changed:

      1. Compile the provided executable test case.
      2. Run it under 1.4.2_xx and you'll get the expected (and correct) result.
      3. Run it under 1.5.0b2 and you'll get a result indicating that the attribute value for "class" has been converted to lower case.

      The HTML being parsed by the sample code is:

      <html>
        <head>
          <title>Example HTML containing some class attributes</title>
          <style type="text/css">
            .bigText { font-size: 16pt; }
          </style>
        </head>
        <body>
          <p class="bigText">This text should be big.</p>
        </body>
      </html>

      (note in particular the element with the class="bigText" attribute definition)


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      When you run the above-provided example (class HtmlParserProblem) under 1.4.2_xx, you'll get this:

      start tag: html
      attributes:
      start tag: head
      attributes:
      start tag: title
      attributes:
      end tag: title
      start tag: style
      attributes: type=text/css
      end tag: style
      end tag: head
      start tag: body
      attributes: type=text/css
      start tag: p
      attributes: type=text/css class=bigText
      end tag: p
      end tag: body
      end tag: html

      This is correct; note the correct mixed-case value "bigText" for the class attribute.
      ACTUAL -
      When you run the above-provided example (class HtmlParserProblem) using Java 1.5.0-b2, you'll get this:

      start tag: html
      attributes:
      start tag: head
      attributes:
      start tag: title
      attributes:
      end tag: title
      start tag: style
      attributes: type=text/css
      end tag: style
      end tag: head
      start tag: body
      attributes: type=text/css
      start tag: p
      attributes: class=bigtext type=text/css
      end tag: p
      end tag: body
      end tag: html

      Note that the class attribute value has been converted to lower case. This is a Bad Thing.


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      public class HtmlParserProblem
      {
        public static void main(final String[] args) throws java.io.IOException
        {
          new Html32Parser().parse(new java.io.StringReader(getExampleHtml()));
        }

        public static class Html32Parser extends javax.swing.text.html.parser.Parser
        {
          public Html32Parser() throws java.io.IOException
          {
            super(loadDtd("html32"));
            this.strict = false;
          }

          public void handleStartTag(final javax.swing.text.html.parser.TagElement tagElement)
          {
            System.out.println("start tag: " + tagElement.getHTMLTag().toString());
            final javax.swing.text.SimpleAttributeSet attributes = getAttributes();
            System.out.println("attributes: " + attributes.toString());
          }

          public void handleEndTag(final javax.swing.text.html.parser.TagElement tagElement)
          {
            System.out.println("end tag: " + tagElement.getHTMLTag().toString());
          }
        }

        private static javax.swing.text.html.parser.DTD loadDtd(final String dtdName)
          throws java.io.IOException
        {
          final String resourceName = dtdName + ".bdtd";
          final java.io.InputStream inputStream =
            javax.swing.text.html.parser.DTD.class.getResourceAsStream(resourceName);
          if (inputStream == null) throw new java.io.IOException(resourceName);
          final javax.swing.text.html.parser.DTD dtd =
            javax.swing.text.html.parser.DTD.getDTD(dtdName);
          dtd.read(new java.io.DataInputStream(inputStream));
          return dtd;
        }

        private static String getExampleHtml()
        {
          return
            "<html>\n" +
            " <head>\n" +
            " <title>Example HTML containing some class attributes</title>\n" +
            " <style type='text/css'>\n" +
            " .bigText { font-size: 16pt; }\n" +
            " </style>\n" +
            " </head>\n" +
            " <body>\n" +
            " <p class='bigText'>This text should be big.</p>\n" +
            " </body>\n" +
            "</html>";
        }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Only use lower-case values for "class" attributes.

      Release Regression From : 1.4.2_04
      The above release value was the last known release where this
      bug was known to work. Since then there has been a regression.

      (Incident Review ID: 290362)
      ======================================================================
      ###@###.### 10/18/04 23:22 GMT

            idk Igor Kushnirskiy (Inactive)
            jssunw Jitender S (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: