Why do we bother to remove hard-coded charset attributes from the <META>
tags in the J2SE guide docs, as reported below.
If the translators change the charset appropriately,
is leaving the charset explicit really a problem?
Here are the results of a charset check:
Date: Sun, 6 Apr 2003 10:04
To: ###@###.###
Subject: charset META tags in J2SE 1.4.2 Docs
Results of search for 1.4.2 .html files that contain <META...charset=...
tags in:
/usr/web/work/j2se/1.4.2/docs
RATIONALE
---------
META...charset=iso-8859-1 (American ANSI) docs cannot be viewed
in Japanese browsers. META...charset=JA.. docs cannot be viewed in
English browsers. Such docs specify a character set that only works
in a single locale. And since the tags are buried in the HTML
header, they are hard for translators to see.)
The files listed below have that tag.
---------------------------------------
guide/2d/spec/j2d-awt.html
guide/2d/spec/j2d-bookTOC.html
guide/2d/spec/j2d-color.html
guide/jdbc/getstart/table8.7.html
guide/jni/spec/acknowledge.html
guide/jws/relnotes.html
guide/net/relnotes.html
guide/net/ja/relnotes.html
guide/security/jgss/jgss-features.html
guide/serialization/spec/version.html
guide/versioning/spec/versioningTOC.html
install-notes/disk-space.html
install-notes/SCCS-ORIG/s.disk-space.html
relnotes/license.html
------------------------------------------
###@###.### wrote:
Once again:
A correct charset tag is best, it tells the browser how to interpret the page correctly and requires no user intervention.
No charset tag is OK, this lets the user guess at and select the character encoding.
A wrong charset tag is bad, because it causes the browser to display garbage and prevents the user from correcting the situation.
The old rule, to not have a charset tag, was based on the assumption that the translators would not be able to adjust the tag when they translate the text. Last year we found out that they are able to do this. So, I think we should change the rule and require a correct charset tag.
A few other corrections:
- 8859 is not a charset, but the number of an ISO standard that defines a series of character encodings. Correct charset names are iso-8859-1, iso-8859-2, etc. to iso-8859-10, iso-8859-13, iso-8859-15.
- iso-8859-1 is not American ANSI. ANSI is the American National Standards Institute, which is a member of ISO and which, among other things, defined ASCII, the American Standard Code for Information Interchange. ISO is the International Organization for Standardization, which, among other things, defined the ISO 8859 series of character encodings, all of which are extensions of ASCII. iso-8859-1 is the charset name of the first character encoding in the ISO 8859 series.
- There is no charset JA. There are a number of Japanese character encodings, with charset names such as euc-jp and shift_jis.
- Japanese browsers can display pages encoded in iso-8859-1, and most English browsers nowadays can display pages encoded in Japanese encodings. What they cannot do is display pages that are encoded in a different character encoding than indicated by their charset tag.
tags in the J2SE guide docs, as reported below.
If the translators change the charset appropriately,
is leaving the charset explicit really a problem?
Here are the results of a charset check:
Date: Sun, 6 Apr 2003 10:04
To: ###@###.###
Subject: charset META tags in J2SE 1.4.2 Docs
Results of search for 1.4.2 .html files that contain <META...charset=...
tags in:
/usr/web/work/j2se/1.4.2/docs
RATIONALE
---------
META...charset=iso-8859-1 (American ANSI) docs cannot be viewed
in Japanese browsers. META...charset=JA.. docs cannot be viewed in
English browsers. Such docs specify a character set that only works
in a single locale. And since the tags are buried in the HTML
header, they are hard for translators to see.)
The files listed below have that tag.
---------------------------------------
guide/2d/spec/j2d-awt.html
guide/2d/spec/j2d-bookTOC.html
guide/2d/spec/j2d-color.html
guide/jdbc/getstart/table8.7.html
guide/jni/spec/acknowledge.html
guide/jws/relnotes.html
guide/net/relnotes.html
guide/net/ja/relnotes.html
guide/security/jgss/jgss-features.html
guide/serialization/spec/version.html
guide/versioning/spec/versioningTOC.html
install-notes/disk-space.html
install-notes/SCCS-ORIG/s.disk-space.html
relnotes/license.html
------------------------------------------
###@###.### wrote:
Once again:
A correct charset tag is best, it tells the browser how to interpret the page correctly and requires no user intervention.
No charset tag is OK, this lets the user guess at and select the character encoding.
A wrong charset tag is bad, because it causes the browser to display garbage and prevents the user from correcting the situation.
The old rule, to not have a charset tag, was based on the assumption that the translators would not be able to adjust the tag when they translate the text. Last year we found out that they are able to do this. So, I think we should change the rule and require a correct charset tag.
A few other corrections:
- 8859 is not a charset, but the number of an ISO standard that defines a series of character encodings. Correct charset names are iso-8859-1, iso-8859-2, etc. to iso-8859-10, iso-8859-13, iso-8859-15.
- iso-8859-1 is not American ANSI. ANSI is the American National Standards Institute, which is a member of ISO and which, among other things, defined ASCII, the American Standard Code for Information Interchange. ISO is the International Organization for Standardization, which, among other things, defined the ISO 8859 series of character encodings, all of which are extensions of ASCII. iso-8859-1 is the charset name of the first character encoding in the ISO 8859 series.
- There is no charset JA. There are a number of Japanese character encodings, with charset names such as euc-jp and shift_jis.
- Japanese browsers can display pages encoded in iso-8859-1, and most English browsers nowadays can display pages encoded in Japanese encodings. What they cannot do is display pages that are encoded in a different character encoding than indicated by their charset tag.