Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4369360

HTML Converter file reading error and broken japanese string

XMLWordPrintable

    • 02
    • generic
    • generic
    • Verified

        HTMLConverter running on Solaris/ja_JP.UTF-8.
        Choice source file of simple HTML source include japanese kanji string.(euc or sjis)
        Start convert procedure, Soon stop progress and output log.
        And additional problem,
        Broken japanese string in case HTMLConverter running on solaris's locale not same encode locale of HTML file.
        Write "charset" meta in HTML head section, Convert success of locale is "ja" and "ja_JP.PCK".

        Result matrix of HTML convert test:
        HTML charset\Solaris Locale
        ja ja_JP.PCK ja_JP.UTF-8
        eucJP pass *1 *2
        shift_jis *1 pass *2
        utf-8 *1 *1 pass
        *1...Convert is finish, but broken japanese character.
        *2...Convert not finish, output log.

        Attached sample HTML files.
        (some japanese string encode pattern files.
         Locale euc/sjis/utf-8, Header meta charset in and out)

        output log: (on Solaris/ja_JP.UTF-8)
        sun.io.MalformedInputException
                at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:152)
                at java.io.InputStreamReader.convertInto(InputStreamReader.java:137)
                at java.io.InputStreamReader.fill(InputStreamReader.java:186)
                at java.io.InputStreamReader.read(InputStreamReader.java:249)
                at java.io.BufferedReader.fill(BufferedReader.java:139)
                at java.io.BufferedReader.read(BufferedReader.java:157)
                at java.io.StreamTokenizer.read(StreamTokenizer.java:472)
                at java.io.StreamTokenizer.nextToken(StreamTokenizer.java:516)
                at sun.plugin.converter.util.StdUtils.countWords(StdUtils.java:109)
                at sun.plugin.converter.engine.PluginConverter.runConversion(PluginConverter.java:314)
                at sun.plugin.converter.engine.PluginConverter.run(PluginConverter.java:250)
                at java.lang.Thread.run(Thread.java:484)

        Use converter version:
        -rw-rw-r-- 1 on113181 staff 187237 Sep 8 06:24 htmlconv1-3.jar

        Traget environment:
        Solaris 8, 7 both intel and sparc.


        osamu.numayama@Japan 2000-09-08

        ---------------------------------------------------------------------------

        HTML documents which is specified the character encoding in META tag are
        converted properly except the encoded document in UTF-8 on Solaris.

        System locale \ HTML chaset | eucJP | shift_jis | UTF-8
        ------------------+---------+-----------+-----------+-----------+
                          | euc | OK | OK | OK
        Solaris8-sparc | pck | OK | OK | OK
                          | utf-8 | NG* | NG* | OK
        ------------------+---------+-----------+-----------+-----------+
                          | euc | OK | OK | OK
        Solaris7-IA | pck | OK | OK | OK
                          | utf-8 | NG* | NG* | OK
        ------------------+---------+-----------+-----------+-----------+
        Windows98 | sjis | OK | OK | OK
        ------------------+---------+-----------+-----------+-----------+
        WindowsNT | sjis | OK | OK | OK
        ------------------+---------+-----------+-----------+-----------+
        Redhat Linux 6.2J | euc | OK | OK | OK
        ------------------+---------+-----------+-----------+-----------+

        (*) The MalformedInputException above occurs and does nothing to the HTML
            files, not converted.

        kenichi.kurosaki@Japan 2000-11-09

              billyh William Harnois
              duke J. Duke
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: