HTMLDocument does not load when Parser Callbacks defined

XMLWordPrintable

    • Type: Bug
    • Resolution: Unresolved
    • Priority: P4
    • None
    • Affects Version/s: 5.0
    • Component/s: client-libs

      FULL PRODUCT VERSION :
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_09-b03)
      Java HotSpot(TM) Client VM (build 1.5.0_09-b03, mixed mode, sharing)

      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows XP [Version 5.1.2600]

      EXTRA RELEVANT SYSTEM CONFIGURATION :
      CPU AMD Athlon 64 3500
      ATI X700 Pro Graphics card
      Belkin PCI Network LAN card (Realtek RTL 8139 family)

      A DESCRIPTION OF THE PROBLEM :
      I am developing code to parse a web page looking for certain tags. Had difficulty getting the web page to load into an HTMLDocument until I found the reference to Bug ID: 4783472.

      So I used the customer work-around in that report and lo and behold I did get my web page to load. This was accomplished by creating a custom HTMLEditorKit.

      Now that I successfully got the web page to load I now wanted to drive some ParserCallback methods as the document came. So I now modified the default HTMLDocument definition to a customized one. Result: I now get the ParserCallback methods working but the web page now does not load into the HTMLDocument. It is an either or situation, not both together. The documentation does not say or support this and that is my reason for believing that we have a bug, just like we have Bug ID: 4783472


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1) Run TestCase 1: Code supplied below.

      2) See TestCase 1: info under "Expected Result:" box below. Fine so far.

      3) Run TestCase 2: Code supplied below.

      4) See TestCase 2: info under "Actual Result: box below. As you can see in the last output we did enter the ParserCallback methods but we did not load the web page into the HTMLDocument. Only a default document output is returned

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      TestCase 1: Expected output as follows(without the <pre> </pre> tags) with only the first part of the web page listing supplied for brevity;

      <pre>

      My parse called
      Tag text length: 0


      <html>
        <head>
          <title>Java Technology </title>
          <meta name="keywords" content="Java, platform">
          <meta name="collection" content="reference">
          <meta name="description" content="Java technology is a portfolio of products that are based on the power of networks and the idea that the same software should run on many different kinds of systems and devices.">
          <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
          <meta name="date" content="2006-12-21">
          <link title="rss" href="http://developers.sun.com/rss/java.xml" rel="alternate" type="application/rss+xml">
          
        </head>
        <body>
          &gt;<!--stopindex-->
      <link href="/css/default_developer.css" rel="stylesheet">&gt;<!-- END METADATA -->
       <script language="JavaScript" type="text/javascript" src="/js/popUp.js"></script><script language="javascript1.2" type="text/javascript" src="/js/sniff.js"></script><script language="javascript1.2" type="text/javascript" src="/js/menucontent.js"></script><script language="javascript1.2" type="text/javascript" src="/js/menucode.js"></script><script language="javascript1.2" type="text/javascript" src="/js/homepage.js"></script><script language="javascript1.2" type="text/javascript" src="/js/developer.js"></script><!--stopindex-->
      <script onload="prepmenus();prephome();done = true" bgcolor="#ffffff" rightmargin="0" topmargin="0" marginwidth="0" marginheight="0" leftmargin="0" class="a0v0" language="JavaScript"><!--
      var s_pageName="home page"
      //-->
      </script><script language="JavaScript"><!--

      ....

      </pre>


      ACTUAL -
      <strong>TestCase 2:</strong> Actual output, showing the ParserCallback methods were entered. Only a blank html document was returned, not the parsed web page;

      <pre>

      My parse called
      handleStartTag: HTML.Attribute found: href="/global/mh/java/"
      <html>
        <head>

        </head>
        <body>
          <p style="margin-top: 0">
            
          </p>
        </body>
      </html>

      </pre>


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      <strong>TestCase 1:</strong> Code that loads the web page successfully.
      <pre>

      package app.ecom.err;

      import javax.swing.*;
      import javax.swing.text.*;
      import javax.swing.text.html.*;

      import javax.swing.text.html.HTMLEditorKit.*;
      import javax.swing.text.html.parser.ParserDelegator;

      import java.io.*;
      import java.net.*;
      import java.util.*;

      class ErrorKit {
      String searchAtt = "href";
      String searchLink = "/global/mh/java/";

      HTMLDocument doc = null;

      int startPos = 0;
      int endPos = 0;
      boolean checkTag = false;

      public static void main(String args[]) {
      ErrorKit newStuff = new ErrorKit();
      newStuff.runErrorStuff();
      }

      /**
      * runErrorStuff() : Do everything in this method for simplicity.
      */
      public void runErrorStuff() {

      try {
      URL url = new URL("http://java.sun.com/");
      URLConnection con = url.openConnection();

      Reader rd = new InputStreamReader(con.getInputStream());
      Writer wr = new BufferedWriter(new OutputStreamWriter(System.out));

      doc = new HTMLDocument();

      /*EditorKit kit = new HTMLEditorKit(); */
      HTMLEditorKit kit = new HTMLEditorKit() {
      public HTMLEditorKit.Parser getParser() {
      return new ErrorKit.PublicParser();
      }
      };

      doc.putProperty("IgnoreCharsetDirective", new Boolean(true));

      kit.read(rd, doc, 0);
      rd.close();

      /* Try writing out the document */
      HTMLWriter writer = new HTMLWriter(wr, doc);
      writer.write();
      wr.flush();
      wr.close();

      } catch (BadLocationException e) {
      e.printStackTrace();
      } catch (IOException e) {
      e.printStackTrace();
      }
      }

      /**
      * Inner static class to instantiate a publicly usable copy of a
      * HTMLEditorKit.Parser object.
      */
      class PublicParser extends ParserDelegator {

      public void parse(
      Reader r,
      HTMLEditorKit.ParserCallback cb,
      boolean ignoreCharSet)
      throws IOException {
      System.out.println("My parse called");
      super.parse(r, cb, ignoreCharSet);
      }
      }
      }

      </pre>


      <strong>TestCase 2:</strong> Code with the addition of the ParserCallback methods. We do this by redefining the HTMLDocument:

      <pre>

      package app.ecom.err;

      import javax.swing.*;
      import javax.swing.text.*;
      import javax.swing.text.html.*;

      import javax.swing.text.html.HTMLEditorKit.*;
      import javax.swing.text.html.parser.ParserDelegator;

      import java.io.*;
      import java.net.*;
      import java.util.*;

      class ErrorKit {
      String searchAtt = "href";
      String searchLink = "/global/mh/java/";

      HTMLDocument doc = null;

      int startPos = 0;
      int endPos = 0;
      boolean checkTag = false;

      public static void main(String args[]) {
      ErrorKit newStuff = new ErrorKit();
      newStuff.runErrorStuff();
      }

      /**
      * runErrorStuff() : Do everything in this method for simplicity.
      */
      public void runErrorStuff() {

      try {
      URL url = new URL("http://java.sun.com/");
      URLConnection con = url.openConnection();

      Reader rd = new InputStreamReader(con.getInputStream());
      Writer wr = new BufferedWriter(new OutputStreamWriter(System.out));

      doc = new HTMLDocument() {
      public HTMLEditorKit.ParserCallback getReader(int pos) {
      return new HTMLEditorKit.ParserCallback() {

      public void handleStartTag(
      HTML.Tag tag,
      MutableAttributeSet att,
      int pos) {

      if (tag == HTML.Tag.A) {
      if (att
      .containsAttribute(
      HTML.getAttributeKey(searchAtt),
      searchLink)) {
      startPos = pos;
      checkTag = true;

      System.out.println(
      "handleStartTag: HTML.Attribute found: "
      + HTML.getAttributeKey("href")
      + "=\""
      + att.getAttribute(
      HTML.getAttributeKey("href"))
      + "\"");

      }
      }
      }

      public void handleEndTag(HTML.Tag tag, int pos) {
      if ((tag == HTML.Tag.A) && (checkTag)) {
      endPos = pos;
      checkTag = false;
      }

      try {
      if (tag == HTML.Tag.HTML)
      this.flush();
      } catch (BadLocationException bl_e) {
      bl_e.printStackTrace();
      }
      }

      };
      }
      };


      /*EditorKit kit = new HTMLEditorKit(); */
      HTMLEditorKit kit = new HTMLEditorKit() {
      public HTMLEditorKit.Parser getParser() {
      return new ErrorKit.PublicParser();
      }
      };

      doc.putProperty("IgnoreCharsetDirective", new Boolean(true));

      kit.read(rd, doc, 0);
      rd.close();

      /* Try writing out the document */
      HTMLWriter writer = new HTMLWriter(wr, doc);
      writer.write();
      wr.flush();
      wr.close();

      } catch (BadLocationException e) {
      e.printStackTrace();
      } catch (IOException e) {
      e.printStackTrace();
      }
      }

      /**
      * Inner static class to instantiate a publicly usable copy of a
      * HTMLEditorKit.Parser object.
      */
      class PublicParser extends ParserDelegator {

      public void parse(
      Reader r,
      HTMLEditorKit.ParserCallback cb,
      boolean ignoreCharSet)
      throws IOException {
      System.out.println("My parse called");
      super.parse(r, cb, ignoreCharSet);
      }
      }
      }

      </pre>
      ---------- END SOURCE ----------

            Assignee:
            Peter Zhelezniakov
            Reporter:
            Roger Yeung (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: