Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6857913

HtmlEditorKit parses html files in wrong way

XMLWordPrintable

        FULL PRODUCT VERSION :
        java version "1.6.0_13"
        Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
        Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode, sharing)

        ADDITIONAL OS VERSION INFORMATION :
        Microsoft Windows [Version 5.2.3790]

        EXTRA RELEVANT SYSTEM CONFIGURATION :
        Not Required

        A DESCRIPTION OF THE PROBLEM :
        HtmlEditorKit is not parsing html files in proper way as it did in JDK 1.4.

        When ever i parse a html file <script> </script> block the parser is not identifying them as normal text. It identify the complete code as 'Comments'.

        But in JDK 1.4. the parsing was very different when i compare the output results with JDK 1.6.

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        Please create a simple html file with <script> block in it. for example:
        <html>
        <head> <title> Testing </title> </head>
        <body>
        <script>
        var i=0;
        alert("hello");
        </script>
        </body>
        </html>

        Then execute the following java code in JRE 1.6 environment.

        import java.net.*;
        import javax.swing.text.*;
        import javax.swing.text.html.*;
        import javax.swing.text.html.parser.*;

        /**
         * This small demo program shows how to use the
         * HTMLEditorKit.Parser and its implementing class
         * ParserDelegator in the Swing system.
         */

        public class HtmlParseDemo {
            public static void main(String [] args) {
                Reader r;
                if (args.length == 0) {
                    System.err.println("Usage: java HTMLParseDemo file]");
                    System.exit(0);
                }
                String spec = args[0];
                try {
                       r = new FileReader(spec);
                       HTMLEditorKit.Parser parser;
                      System.out.println("About to parse " + spec);
                      parser = new ParserDelegator();
                      parser.parse(r, new HTMLParseLister(), true);
                      r.close();
                    }
                catch (Exception e) {
                    System.err.println("Error: " + e);
                    e.printStackTrace(System.err);
                }
            }
        }

        /**
         * HTML parsing proceeds by calling a callback for
         * each and every piece of the HTML do*****ent. This
         * simple callback class simply prints an indented
         * structural listing of the HTML data.
         */
        class HTMLParseLister extends HTMLEditorKit.ParserCallback
        {
            int indentSize = 0;

            protected void indent() {
                indentSize += 3;
            }
            protected void unIndent() {
                indentSize -= 3; if (indentSize < 0) indentSize = 0;
            }

            protected void pIndent() {
                for(int i = 0; i < indentSize; i++) System.out.print(" ");
            }

            public void handleText(char[] data, int pos) {
                pIndent();
                System.out.println("Text(" + data.length + " chars)");
            }

            public void handleComment(char[] data, int pos) {
                pIndent();
                System.out.println("Comment(" + data.length + " chars)");
            }

            public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
                pIndent();
                System.out.println("Tag start(<" + t.toString() + ">, " +
                                   a.getAttributeCount() + " attrs)");
                indent();
            }

            public void handleEndTag(HTML.Tag t, int pos) {
                unIndent();
                pIndent();
                System.out.println("Tag end(</" + t.toString() + ">)");
            }

            public void handleSimpleTag(HTML.Tag t, MutableAttributeSet a, int pos) {
                pIndent();
                System.out.println("Tag(<" + t.toString() + ">, " +
                                   a.getAttributeCount() + " attrs)");
            }

            public void handleError(String errorMsg, int pos){
                //System.out.println("Parsing error: " + errorMsg + " at " + pos);
            }
        }

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        About to parse html.txt
        Tag start(<html>, 0 attrs)
           Tag start(<head>, 0 attrs)
              Tag start(<title>, 0 attrs)
                 Text(7 chars)
              Tag end(</title>)
           Tag end(</head>)
           Tag start(<body>, 0 attrs)
              Tag start(<script>, 0 attrs)
                 Text(9 chars)
              Tag end(</script>)
           Tag end(</body>)
        Tag end(</html>)
        ACTUAL -
        Tag start(<html>, 0 attrs)
           Tag start(<head>, 0 attrs)
              Tag start(<title>, 0 attrs)
                 Text(7 chars)
              Tag end(</title>)
           Tag end(</head>)
           Tag start(<body>, 0 attrs)
              Tag start(<script>, 0 attrs)
                 Comment(24 chars)
              Tag end(</script>)
              Text(1 chars)
           Tag end(</body>)
        Tag end(</html>)

        ERROR MESSAGES/STACK TRACES THAT OCCUR :
        No error messages

        REPRODUCIBILITY :
        This bug can be reproduced always.

        ---------- BEGIN SOURCE ----------
        Please create a simple html file with <script> block in it. for example:
        <html>
        <head> <title> Testing </title> </head>
        <body>
        <script>
        var i=0;
        alert("hello");
        </script>
        </body>
        </html>

        Then execute the following java code in JRE 1.6 environment.

        import java.net.*;
        import javax.swing.text.*;
        import javax.swing.text.html.*;
        import javax.swing.text.html.parser.*;

        /**
         * This small demo program shows how to use the
         * HTMLEditorKit.Parser and its implementing class
         * ParserDelegator in the Swing system.
         */

        public class HtmlParseDemo {
            public static void main(String [] args) {
                Reader r;
                if (args.length == 0) {
                    System.err.println("Usage: java HTMLParseDemo file]");
                    System.exit(0);
                }
                String spec = args[0];
                try {
                       r = new FileReader(spec);
                       HTMLEditorKit.Parser parser;
                      System.out.println("About to parse " + spec);
                      parser = new ParserDelegator();
                      parser.parse(r, new HTMLParseLister(), true);
                      r.close();
                    }
                catch (Exception e) {
                    System.err.println("Error: " + e);
                    e.printStackTrace(System.err);
                }
            }
        }

        /**
         * HTML parsing proceeds by calling a callback for
         * each and every piece of the HTML do*****ent. This
         * simple callback class simply prints an indented
         * structural listing of the HTML data.
         */
        class HTMLParseLister extends HTMLEditorKit.ParserCallback
        {
            int indentSize = 0;

            protected void indent() {
                indentSize += 3;
            }
            protected void unIndent() {
                indentSize -= 3; if (indentSize < 0) indentSize = 0;
            }

            protected void pIndent() {
                for(int i = 0; i < indentSize; i++) System.out.print(" ");
            }

            public void handleText(char[] data, int pos) {
                pIndent();
                System.out.println("Text(" + data.length + " chars)");
            }

            public void handleComment(char[] data, int pos) {
                pIndent();
                System.out.println("Comment(" + data.length + " chars)");
            }

            public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
                pIndent();
                System.out.println("Tag start(<" + t.toString() + ">, " +
                                   a.getAttributeCount() + " attrs)");
                indent();
            }

            public void handleEndTag(HTML.Tag t, int pos) {
                unIndent();
                pIndent();
                System.out.println("Tag end(</" + t.toString() + ">)");
            }

            public void handleSimpleTag(HTML.Tag t, MutableAttributeSet a, int pos) {
                pIndent();
                System.out.println("Tag(<" + t.toString() + ">, " +
                                   a.getAttributeCount() + " attrs)");
            }

            public void handleError(String errorMsg, int pos){
                //System.out.println("Parsing error: " + errorMsg + " at " + pos);
            }
        }
        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        Please suggest me a work around. i dont find anything

        Release Regression From : 1.4
        The above release value was the last known release where this
        bug was not reproducible. Since then there has been a regression.

              dmeetry Dmeetry Degrave (Inactive)
              ndcosta Nelson Dcosta (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: