Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4741147

HTML Parser does not throw exception on non-HTML code but takes forever

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: P4 P4
    • None
    • 1.4.1
    • client-libs



      Name: jk109818 Date: 09/03/2002


      FULL PRODUCT VERSION :
      java version "1.4.1-rc"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-rc-b19)
      Java HotSpot(TM) Client VM (build 1.4.1-rc-b19, mixed mode)


      FULL OPERATING SYSTEM VERSION : Windows 2000


      ADDITIONAL OPERATING SYSTEMS : Probably all



      A DESCRIPTION OF THE PROBLEM :
      If the HTML Parser (HTMLEditorKit.Parser) is given a .doc
      file to parse (by mistake - because it works on IE), it
      takes forever to parse it and does not throw an exception.

      I imagine a .doc file is sufficiently horrible to look at
      for even the most obtuse parser to say that this is not an
      HTML file and throw an exception.

      Instead of that, it ran for 3516 seconds before finishing
      without an error.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Run the program
      2. Wait half an hour (400MHz PII)
      3. Read the result

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      I would expect the parser to throw an Exception to say that
      the format is unknown

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      None - that's the problem

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      package monitor;

      import java.io.InputStream;
      import java.io.InputStreamReader;

      import java.net.HttpURLConnection;
      import java.net.URL;

      import javax.swing.text.html.HTMLDocument;
      //import javax.swing.text.html.HTMLDocument.HTMLReader;
      import javax.swing.text.html.HTMLEditorKit;
      import javax.swing.text.html.HTMLEditorKit.Parser;
      import javax.swing.text.html.HTMLEditorKit.ParserCallback;

      public class HTMLParserBug {

          private InputStreamReader inputStreamReader;

          public HTMLParserBug() {

              try {
                  URL url = new URL("http://www.yellow-
      b.com/docs/jet/JET_description_30.doc");
                  HttpURLConnection connection = (HttpURLConnection)url.openConnection
      ();
                  InputStream httpInputStream = (InputStream)connection.getContent();
                  inputStreamReader = new InputStreamReader(httpInputStream);
              } catch (Exception e) {
                  throw new Error("Problems accessing file",e);
              }

              try {
                  HTMLEditorKit htmlEditorKit = new HTMLEditorKit();
                  HTMLDocument htmlDocument = (HTMLDocument)
      htmlEditorKit.createDefaultDocument();
                  Parser parser = htmlDocument.getParser();
                  ParserCallback htmlReader = htmlDocument.getReader(0);
                  long startTime = System.currentTimeMillis();
                  parser.parse(inputStreamReader,htmlReader, true);
                  long endTime = System.currentTimeMillis();
                  System.out.println("Finished without error after "+((endTime-
      startTime)/1000)+" seconds");
              } catch (Exception e) {
                  throw new Error("Parser has signalled an error",e);
              }
          }

          public static void main (String[] args) {
              HTMLParserBug bug = new HTMLParserBug();
          }
      }
      ---------- END SOURCE ----------

      CUSTOMER WORKAROUND :
      Write your own parser
      (Review ID: 163473)
      ======================================================================

            peterz Peter Zhelezniakov
            jkimsunw Jeffrey Kim (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: