Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4806463

Self-closing tags incorrectly parsed by javax.swing.text.html.parser.Parser

XMLWordPrintable

    • b53
    • generic, x86
    • generic, windows_xp

      Name: jk109818 Date: 01/22/2003


      FULL PRODUCT VERSION :
      java version "1.4.1"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
      Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)

      FULL OPERATING SYSTEM VERSION :

      Microsoft Windows XP [Version 5.1.2600]
      Service Pack 1 installed

      ADDITIONAL OPERATING SYSTEMS :

      Occurs on all platforms, as problem is in the JFC code.

      A DESCRIPTION OF THE PROBLEM :
      The HTML parser included in Swing's text classes contains
      code to handle XML-style self-closing tags (e.g. "<br/>"). A
      defect in the parser causes the slash ("/") in the tag to be
      treated as the closing bracket (">"), and the closing
      bracket is subsequently parsed as part of the document
      content following the tag.

      E.g. HTML Code:

      <html><body><p>This is a<br/>test</p></body></html>

      Display in JEditorPane:

      This is a
      >test


        From my analysis, the code at fault appears in
      javax.swing.text.html.parser.Parser in the parseTag()
      method. The following is the erroneous code fragment, as it
      appears in J2SE 1.3.1-05 and J2SE 1.4.1-01:

      switch (ch) {
      case '/':
      net = true;
      case '>':
      ch = readCh();
      case '<':
      break;

      default:
      error("expected", "'>'");
      break;
      }

      The first case statement, which sets the 'net' flag true,
      should also advance the parser by one character by calling
      readCh() before allowing control to flow into the next case
        statement, which will call it again. The current code only
      calls readCh() once, whether the tag is terminated with '>'
      or '/>', resulting in the behaviour described above.

      The corrected code fragment, according to my analysis, would
      be as follows:

      switch (ch) {
      case '/':
      net = true;
      ch = readCh();
      case '>':
      ch = readCh();
      case '<':
      break;

      default:
      error("expected", "'>'");
      break;
      }



      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Instantiate and initialize a JTextPane as follows:

      JTextPane myJTextPane = new JTextPane();
      myJTextPane.setEditorKit(new HTMLEditorKit());
      myJTextPane.setText("<p>This is a<br/>test.</p>");

      2. Show the JTextPane in a JFrame or JApplet.

      3. Note the spurious '>' that appears before the word 'test'.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      Expected the "/>" at the end of an HTML tag to be treated
      just like a ">" (since the Swing HTML parser is not advanced
      enough to enforce self-closing tags). Instead, the "/" is
      treated like the ">" and the ">" is displayed as part of the
      document content.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.awt.*;
      import javax.swing.*;
      import javax.swing.text.html.*;

      public class HTMLParserDemo
      {
      public static void main(String[] args)
      {
      JFrame f = new JFrame("Test");
      Container c = f.getContentPane();

      JTextPane tp = new JTextPane();
      tp.setEditorKit(new HTMLEditorKit());
      tp.setText("<p>This is a <br/>test.</p>");

      c.add(tp,BorderLayout.CENTER);
      f.pack();
      f.setDefaultCloseOperation(f.EXIT_ON_CLOSE);
      f.show();
      }
      }
      ---------- END SOURCE ----------

      CUSTOMER WORKAROUND :
      Since the erroneous code is contained in a package-private
      method (parseTag()) which is called by a method that is
      itself package-private (parseContent()) and calls other
      package-private methods, and because most practical
      applications rely on
      javax.swing.text.html.parser.ParserDelegator to instantiate
      the Parser subclass
      javax.swing.text.html.parser.DocumentParser, working around
      this bug would involve:

      - Reimplementing most of the Parser class in a subclass of
      DocumentParser, a rather large class.

      - Subclassing ParserDelegator to use the new subclass
      instead of DocumentParser.

      - Subclassing HTMLEditorKit to use the subclass instead of
      ParserDelegator.

      This is a code-heavy workaround that is impractical in
      size-constrained projects such as applets that need to
      correctly parse arbitrary HTML.
      (Review ID: 166899)
      ======================================================================

            peterz Peter Zhelezniakov
            jkimsunw Jeffrey Kim (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: