-
Bug
-
Resolution: Fixed
-
P3
-
1.4.1, 6
-
b53
-
generic, x86
-
generic, windows_xp
Name: jk109818 Date: 01/22/2003
FULL PRODUCT VERSION :
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)
FULL OPERATING SYSTEM VERSION :
Microsoft Windows XP [Version 5.1.2600]
Service Pack 1 installed
ADDITIONAL OPERATING SYSTEMS :
Occurs on all platforms, as problem is in the JFC code.
A DESCRIPTION OF THE PROBLEM :
The HTML parser included in Swing's text classes contains
code to handle XML-style self-closing tags (e.g. "<br/>"). A
defect in the parser causes the slash ("/") in the tag to be
treated as the closing bracket (">"), and the closing
bracket is subsequently parsed as part of the document
content following the tag.
E.g. HTML Code:
<html><body><p>This is a<br/>test</p></body></html>
Display in JEditorPane:
This is a
>test
From my analysis, the code at fault appears in
javax.swing.text.html.parser.Parser in the parseTag()
method. The following is the erroneous code fragment, as it
appears in J2SE 1.3.1-05 and J2SE 1.4.1-01:
switch (ch) {
case '/':
net = true;
case '>':
ch = readCh();
case '<':
break;
default:
error("expected", "'>'");
break;
}
The first case statement, which sets the 'net' flag true,
should also advance the parser by one character by calling
readCh() before allowing control to flow into the next case
statement, which will call it again. The current code only
calls readCh() once, whether the tag is terminated with '>'
or '/>', resulting in the behaviour described above.
The corrected code fragment, according to my analysis, would
be as follows:
switch (ch) {
case '/':
net = true;
ch = readCh();
case '>':
ch = readCh();
case '<':
break;
default:
error("expected", "'>'");
break;
}
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Instantiate and initialize a JTextPane as follows:
JTextPane myJTextPane = new JTextPane();
myJTextPane.setEditorKit(new HTMLEditorKit());
myJTextPane.setText("<p>This is a<br/>test.</p>");
2. Show the JTextPane in a JFrame or JApplet.
3. Note the spurious '>' that appears before the word 'test'.
EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected the "/>" at the end of an HTML tag to be treated
just like a ">" (since the Swing HTML parser is not advanced
enough to enforce self-closing tags). Instead, the "/" is
treated like the ">" and the ">" is displayed as part of the
document content.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.awt.*;
import javax.swing.*;
import javax.swing.text.html.*;
public class HTMLParserDemo
{
public static void main(String[] args)
{
JFrame f = new JFrame("Test");
Container c = f.getContentPane();
JTextPane tp = new JTextPane();
tp.setEditorKit(new HTMLEditorKit());
tp.setText("<p>This is a <br/>test.</p>");
c.add(tp,BorderLayout.CENTER);
f.pack();
f.setDefaultCloseOperation(f.EXIT_ON_CLOSE);
f.show();
}
}
---------- END SOURCE ----------
CUSTOMER WORKAROUND :
Since the erroneous code is contained in a package-private
method (parseTag()) which is called by a method that is
itself package-private (parseContent()) and calls other
package-private methods, and because most practical
applications rely on
javax.swing.text.html.parser.ParserDelegator to instantiate
the Parser subclass
javax.swing.text.html.parser.DocumentParser, working around
this bug would involve:
- Reimplementing most of the Parser class in a subclass of
DocumentParser, a rather large class.
- Subclassing ParserDelegator to use the new subclass
instead of DocumentParser.
- Subclassing HTMLEditorKit to use the subclass instead of
ParserDelegator.
This is a code-heavy workaround that is impractical in
size-constrained projects such as applets that need to
correctly parse arbitrary HTML.
(Review ID: 166899)
======================================================================
FULL PRODUCT VERSION :
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)
FULL OPERATING SYSTEM VERSION :
Microsoft Windows XP [Version 5.1.2600]
Service Pack 1 installed
ADDITIONAL OPERATING SYSTEMS :
Occurs on all platforms, as problem is in the JFC code.
A DESCRIPTION OF THE PROBLEM :
The HTML parser included in Swing's text classes contains
code to handle XML-style self-closing tags (e.g. "<br/>"). A
defect in the parser causes the slash ("/") in the tag to be
treated as the closing bracket (">"), and the closing
bracket is subsequently parsed as part of the document
content following the tag.
E.g. HTML Code:
<html><body><p>This is a<br/>test</p></body></html>
Display in JEditorPane:
This is a
>test
From my analysis, the code at fault appears in
javax.swing.text.html.parser.Parser in the parseTag()
method. The following is the erroneous code fragment, as it
appears in J2SE 1.3.1-05 and J2SE 1.4.1-01:
switch (ch) {
case '/':
net = true;
case '>':
ch = readCh();
case '<':
break;
default:
error("expected", "'>'");
break;
}
The first case statement, which sets the 'net' flag true,
should also advance the parser by one character by calling
readCh() before allowing control to flow into the next case
statement, which will call it again. The current code only
calls readCh() once, whether the tag is terminated with '>'
or '/>', resulting in the behaviour described above.
The corrected code fragment, according to my analysis, would
be as follows:
switch (ch) {
case '/':
net = true;
ch = readCh();
case '>':
ch = readCh();
case '<':
break;
default:
error("expected", "'>'");
break;
}
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Instantiate and initialize a JTextPane as follows:
JTextPane myJTextPane = new JTextPane();
myJTextPane.setEditorKit(new HTMLEditorKit());
myJTextPane.setText("<p>This is a<br/>test.</p>");
2. Show the JTextPane in a JFrame or JApplet.
3. Note the spurious '>' that appears before the word 'test'.
EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected the "/>" at the end of an HTML tag to be treated
just like a ">" (since the Swing HTML parser is not advanced
enough to enforce self-closing tags). Instead, the "/" is
treated like the ">" and the ">" is displayed as part of the
document content.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.awt.*;
import javax.swing.*;
import javax.swing.text.html.*;
public class HTMLParserDemo
{
public static void main(String[] args)
{
JFrame f = new JFrame("Test");
Container c = f.getContentPane();
JTextPane tp = new JTextPane();
tp.setEditorKit(new HTMLEditorKit());
tp.setText("<p>This is a <br/>test.</p>");
c.add(tp,BorderLayout.CENTER);
f.pack();
f.setDefaultCloseOperation(f.EXIT_ON_CLOSE);
f.show();
}
}
---------- END SOURCE ----------
CUSTOMER WORKAROUND :
Since the erroneous code is contained in a package-private
method (parseTag()) which is called by a method that is
itself package-private (parseContent()) and calls other
package-private methods, and because most practical
applications rely on
javax.swing.text.html.parser.ParserDelegator to instantiate
the Parser subclass
javax.swing.text.html.parser.DocumentParser, working around
this bug would involve:
- Reimplementing most of the Parser class in a subclass of
DocumentParser, a rather large class.
- Subclassing ParserDelegator to use the new subclass
instead of DocumentParser.
- Subclassing HTMLEditorKit to use the subclass instead of
ParserDelegator.
This is a code-heavy workaround that is impractical in
size-constrained projects such as applets that need to
correctly parse arbitrary HTML.
(Review ID: 166899)
======================================================================