Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4847584

What to do about hard-coded charset META tags in J2SE docs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 1.4.2
    • 1.4.2
    • docs
    • rc
    • generic
    • other

      Why do we bother to remove hard-coded charset attributes from the <META>
      tags in the J2SE guide docs, as reported below.

      If the translators change the charset appropriately,
      is leaving the charset explicit really a problem?

      Here are the results of a charset check:

        Date: Sun, 6 Apr 2003 10:04
        To: ###@###.###
        Subject: charset META tags in J2SE 1.4.2 Docs

        Results of search for 1.4.2 .html files that contain <META...charset=...
        tags in:
        /usr/web/work/j2se/1.4.2/docs

        RATIONALE
        ---------
        META...charset=iso-8859-1 (American ANSI) docs cannot be viewed
        in Japanese browsers. META...charset=JA.. docs cannot be viewed in
        English browsers. Such docs specify a character set that only works
        in a single locale. And since the tags are buried in the HTML
        header, they are hard for translators to see.)
       
        The files listed below have that tag.
        ---------------------------------------
        guide/2d/spec/j2d-awt.html
        guide/2d/spec/j2d-bookTOC.html
        guide/2d/spec/j2d-color.html
        guide/jdbc/getstart/table8.7.html
        guide/jni/spec/acknowledge.html
        guide/jws/relnotes.html
        guide/net/relnotes.html
        guide/net/ja/relnotes.html
        guide/security/jgss/jgss-features.html
        guide/serialization/spec/version.html
        guide/versioning/spec/versioningTOC.html
        install-notes/disk-space.html
        install-notes/SCCS-ORIG/s.disk-space.html
        relnotes/license.html
        ------------------------------------------

      ###@###.### wrote:
      Once again:

      A correct charset tag is best, it tells the browser how to interpret the page correctly and requires no user intervention.
      No charset tag is OK, this lets the user guess at and select the character encoding.
      A wrong charset tag is bad, because it causes the browser to display garbage and prevents the user from correcting the situation.

      The old rule, to not have a charset tag, was based on the assumption that the translators would not be able to adjust the tag when they translate the text. Last year we found out that they are able to do this. So, I think we should change the rule and require a correct charset tag.

      A few other corrections:

      - 8859 is not a charset, but the number of an ISO standard that defines a series of character encodings. Correct charset names are iso-8859-1, iso-8859-2, etc. to iso-8859-10, iso-8859-13, iso-8859-15.

      - iso-8859-1 is not American ANSI. ANSI is the American National Standards Institute, which is a member of ISO and which, among other things, defined ASCII, the American Standard Code for Information Interchange. ISO is the International Organization for Standardization, which, among other things, defined the ISO 8859 series of character encodings, all of which are extensions of ASCII. iso-8859-1 is the charset name of the first character encoding in the ISO 8859 series.

      - There is no charset JA. There are a number of Japanese character encodings, with charset names such as euc-jp and shift_jis.

      - Japanese browsers can display pages encoded in iso-8859-1, and most English browsers nowadays can display pages encoded in Japanese encodings. What they cannot do is display pages that are encoded in a different character encoding than indicated by their charset tag.

            dkramersunw Douglas Kramer (Inactive)
            dkramersunw Douglas Kramer (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: