-
Bug
-
Resolution: Not an Issue
-
P4
-
None
-
8, 11, 12, 13
-
x86_64
-
linux
A DESCRIPTION OF THE PROBLEM :
The code for Charset.defaultCharset() is written in a way that if it is unable to find file.encoding in the vm params it initialises defaultCharset to UTF-8. However the else statement here is actually dead code if you consider the vm holistically, The reason i am stating this is that if you don't pass file.encoding param to the vm it tries to infer the value based on LC_ALL, LANG, LC_CTYPE and even if the are not set the file.encoding gets initialised to US_ASCII. So there is actually a contradiction in these two processes i.e. the initialisation of file.encoding and Charset.defaultCharset() code, while one is giving signal that the encoding default should be UTF-8 the other is making it to US_ASCII
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Remove the environment variables LC_ALL, LANG, LC_CTYPE from your shell.
2. Write a code in java to invoke Charset.defaultCharset() and print result.
3. Invoke the code without specifying file.encoding param.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The result will be US_ASCII
ACTUAL -
The actual result should be UTF-8 or the code in Charset.defaultCharset() should be changed to US_ASCII too to make it consistent.
CUSTOMER SUBMITTED WORKAROUND :
The workaround is to pass -Dfile.encoding=UTF-8 so that it matches with the expected default in Charset.defaultCharset()
The code for Charset.defaultCharset() is written in a way that if it is unable to find file.encoding in the vm params it initialises defaultCharset to UTF-8. However the else statement here is actually dead code if you consider the vm holistically, The reason i am stating this is that if you don't pass file.encoding param to the vm it tries to infer the value based on LC_ALL, LANG, LC_CTYPE and even if the are not set the file.encoding gets initialised to US_ASCII. So there is actually a contradiction in these two processes i.e. the initialisation of file.encoding and Charset.defaultCharset() code, while one is giving signal that the encoding default should be UTF-8 the other is making it to US_ASCII
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Remove the environment variables LC_ALL, LANG, LC_CTYPE from your shell.
2. Write a code in java to invoke Charset.defaultCharset() and print result.
3. Invoke the code without specifying file.encoding param.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The result will be US_ASCII
ACTUAL -
The actual result should be UTF-8 or the code in Charset.defaultCharset() should be changed to US_ASCII too to make it consistent.
CUSTOMER SUBMITTED WORKAROUND :
The workaround is to pass -Dfile.encoding=UTF-8 so that it matches with the expected default in Charset.defaultCharset()