-
Enhancement
-
Resolution: Won't Fix
-
P4
-
None
-
6u21
-
x86
-
linux
A DESCRIPTION OF THE REQUEST :
Re-open bug #4950409 which requests that the GB2312 be made an alias of GBK instead of EUC_CN.
This RFE was (in our opinion) erroneously marked as a duplicate of #4914869 when in fact it describes a different issue.
JUSTIFICATION :
GBK is an extension of the GB2312 character encoding and is fully backwards compatible. It allows encoding of additional Chinese characters in comparison to GB2312 and its derivatives.
As mentioned in the original RFE, many programs (mail clients in particular) use gb2312 as the encoding name when they mean GBK.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
That "gb2312" (at the very least) be added as an alias to the GBK character set.
This would result in Charset.forName("gb2312") returning an instance of the GBK Charset.
This would allow Java programs to correctly decode data encoded using GBK which list "gb2312" as their encoding (which as the original submitter remarked appears to be common practice)
ACTUAL -
Charset.forName("gb2312") returns the EUC_CN charset, which leads to "unmappable" error characters when decoding Chinese text which is marked as having been encoded using gb2312 when in fact it contains GBK encoding.
CUSTOMER SUBMITTED WORKAROUND :
Only one, which is to manually modify the charsets.jar located in the JRE's /lib director, replacing the existing EUC_CN.class with a modified one which contains the following code:
public class EUC_CN extends GBK {
}
This has the effect of replacing the EUC_CN charset with the GBK charset, which ,as the latter is backwards compatible with the former, should not be a problem.
This is a very ugly hack, but seems to be the only workaround that works, as these mappings are very much hardcoded into the JRE.
Re-open bug #4950409 which requests that the GB2312 be made an alias of GBK instead of EUC_CN.
This RFE was (in our opinion) erroneously marked as a duplicate of #4914869 when in fact it describes a different issue.
JUSTIFICATION :
GBK is an extension of the GB2312 character encoding and is fully backwards compatible. It allows encoding of additional Chinese characters in comparison to GB2312 and its derivatives.
As mentioned in the original RFE, many programs (mail clients in particular) use gb2312 as the encoding name when they mean GBK.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
That "gb2312" (at the very least) be added as an alias to the GBK character set.
This would result in Charset.forName("gb2312") returning an instance of the GBK Charset.
This would allow Java programs to correctly decode data encoded using GBK which list "gb2312" as their encoding (which as the original submitter remarked appears to be common practice)
ACTUAL -
Charset.forName("gb2312") returns the EUC_CN charset, which leads to "unmappable" error characters when decoding Chinese text which is marked as having been encoded using gb2312 when in fact it contains GBK encoding.
CUSTOMER SUBMITTED WORKAROUND :
Only one, which is to manually modify the charsets.jar located in the JRE's /lib director, replacing the existing EUC_CN.class with a modified one which contains the following code:
public class EUC_CN extends GBK {
}
This has the effect of replacing the EUC_CN charset with the GBK charset, which ,as the latter is backwards compatible with the former, should not be a problem.
This is a very ugly hack, but seems to be the only workaround that works, as these mappings are very much hardcoded into the JRE.
- relates to
-
JDK-4950409 GB2312 should an alias of GBK, instead of EUC_CN
-
- Closed
-