Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: 6
Affects Version/s: 5.0
Component/s: client-libs
Labels:
- text

Subcomponent:
javax.swing
Resolved In Build:
beta
CPU:

x86
OS:

linux_redhat_8.0, windows_nt

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-2116943	5.0u4	Peter Zhelezniakov	P3	Resolved	Fixed	b03

I wrote a small program (below) to parse an html document. I observe different results with JDK 1.4.2_04-b05 and 1.5.0-beta2-b51.
To me, the result given by 1.4.2 seems to be the correct one.

--------------------------------------------------------------------------------------------------
import java.io.*;

import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;

public class HtmlParser {

    private static String translation = null;

    public static void main(String[] unused) throws Exception {

        BufferedReader in =
            new BufferedReader(
                new InputStreamReader(
                    new FileInputStream("/root/test_programs/saaj/temp.html")));

        ParserDelegator parser = new ParserDelegator();

        HTMLEditorKit.ParserCallback callback =
            new HTMLEditorKit.ParserCallback() {

                // the translation will be in the div tag
                private boolean end_search = false;
                private boolean found_first_textarea = false;

                public void handleText(char[] data, int pos) {
                    if (found_first_textarea) {
                        translation = new String(data);
                    }
                }

                public void handleStartTag(HTML.Tag tag,
                MutableAttributeSet attrSet, int pos) {
                    if (tag == HTML.Tag.DIV && end_search != true) {
                        found_first_textarea = true;
                    }
                }

                public void handleEndTag(HTML.Tag t, int pos) {
                    if (t == HTML.Tag.DIV && end_search != true) {
                        end_search = true;
                        found_first_textarea = false;
                    }
                }
            };

        parser.parse(in, callback , true);
        in.close();

        System.out.println("Result: " + translation);

    }
}

--------------------------------------------------------------------------------------------------

The html document used will be attached.

Here's a portion of an email conversation relevant to this issue:

--------------------------------------------------------------------------------------------------
Subject:
[Fwd: [Fwd: Re: Fwd: Regression in HTML Parsing?]]
From:
Rakesh Menon <###@###.###>
Date:
Wed, 26 May 2004 16:21:37 +0530
To:
Sreejith A K <###@###.###>
CC:
Anita Jindal <###@###.###>, ###@###.###, Rakesh Menon <###@###.###>

Hi,

I confirmed. Its a regression bug.
Igor(###@###.###) and Scott (###@###.###) is investigating further on this.

Thanks,
Rakesh

-------- Original Message --------
Subject: Re: Fwd: Regression in HTML Parsing?
Resent-Date: Wed, 26 May 2004 01:00:34 -0700
Resent-From: ###@###.###
Date: Tue, 25 May 2004 11:24:13 -0700
From: Scott Violet <###@###.###>
To: Anton Nashatyrev <###@###.###>
CC: ###@###.###
References: <20040525172747.GI12475@zaz>
<###@###.###>

Yuck. Any idea what fix cased this regression?
Thanks,

    -Scott

On Tue, May 25, 2004 at 10:18:16PM +0400, Anton Nashatyrev wrote:
Hello Scott,

   here is simplified HTML case :

<table border=1>
  <tr><td>
aaa
<style> </style> bbb
  </tr></td>
</table>

This HTML is displayed as follows :
-------
| aaa |
-------
| bbb |
-------

Though changing <style></style> tag to any invalid one fixes this :
-----------
| aaa bbb |
-----------

It looks like we use incorrect recovering policy in Parser. <STYLE></style>
tag shouldn't appear in <BODY> context so in this case we should just
ignore it and don't make any tag adjustments.

I think a bug should be filed on this.

Thank you.
Anton.
--------------------------------------------------------------------------------------------------

###@###.### 2004-05-26
###@###.### 2004-05-26

backported by

JDK-2116943 Regression in html parsing in tiger beta 2

Resolved

duplicates

JDK-5053319 REGRESSION: <style> tag in body content breaks HTML structure

Closed

Assignee:: Igor Kushnirskiy (Inactive)

Reporter:: J. Duke

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2004-05-26 08:15

Updated:: 2004-10-13 11:05

Resolved:: 2004-09-13 17:27

Imported:: 16/Sep/12 6:32 AM

Indexed:: 18/Jul/12 2:23 AM

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates