FULL PRODUCT VERSION :
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
ubuntu 14.04
mac
Linux kittingj-covdesktop-lnx 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Does not occur on windows
EXTRA RELEVANT SYSTEM CONFIGURATION :
ubuntu locale/LANG = en_US.UTF8
JVM is launched with -Dfile.encoding=UTF-16
A DESCRIPTION OF THE PROBLEM :
On linux/mac, but not windows... when the JVM is configured with -Dfile.encoding=UTF-16
(or UTF-16BE or UTF-16LE)
System.getenv() returns garbage strings.
They seems to be a combination of BOM/Byte Order Marker with UTF-8
They aren't usable java unicode strings .
There seems to be some kind of mistranslation happening between getting these values from the OS and Java.
The expectation here is that I should be able print these.
I don't see this problem under windows, but do see it on linux/mac.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Try printing strings that come from System.getenv() when...
OS = Linux/Mac, english UTF-8
JVM is launched with -Dfile.encoding=UTF-16
public class Main {
public static void main(String[] args) throws UnsupportedEncodingException {
Map<String, String> env = System.getenv();
String defaultCharsetName = Charset.defaultCharset().toString();
System.out.printf("The default charset = %s%n", defaultCharsetName);
for (Map.Entry<String, String> entry : env.entrySet()) {
System.out.printf("%s = %s%n", entry.getKey(), entry.getValue());
byte[] keyBytes = entry.getKey().getBytes();
byte[] valueBytes = entry.getValue().getBytes();
System.out.printf("%s%n", Arrays.toString(keyBytes));
System.out.printf("%s%n", Arrays.toString(valueBytes));
}
System.out.println("Done");
}
On Linux/Mac
copy the above code to a file -- Main.java
javac Main.java
java -Dfile.encoding=UTF-16 Main
Compare the environment variables with what you see in a terminal window
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I would expect that printing the values returned by System.getenv() would look equivalent to
looking at the env variables in a terminal window. (e.g. type set in a terminal window)
ACTUAL -
Observe that the env variables all print as garbage.
Also, the byte encodings aren't correct for UTF-16
For English characters, you would expect to see '00' in every other byte.
Instead, it looks like UTF-8 with a BOM.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
code compiles and runs; but displays incorrectly on Linux/Mac.
It does display correctly on Windows (Win 7 Pro - 64Bit/English)
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
public class Main {
public static void main(String[] args) throws UnsupportedEncodingException {
Map<String, String> env = System.getenv();
String defaultCharsetName = Charset.defaultCharset().toString();
System.out.printf("The default charset = %s%n", defaultCharsetName);
for (Map.Entry<String, String> entry : env.entrySet()) {
System.out.printf("%s = %s%n", entry.getKey(), entry.getValue());
byte[] keyBytes = entry.getKey().getBytes();
byte[] valueBytes = entry.getValue().getBytes();
System.out.printf("%s%n", Arrays.toString(keyBytes));
System.out.printf("%s%n", Arrays.toString(valueBytes));
}
System.out.println("Done");
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
There's isn't a good way to workaround this
The best I've come up with is...
getBytes() on the string from System.getenv()
remove the BOM / create a new byte array that doesn't contain a BOM
create a new string using these bytes and specifying a charset, e.g.
str = new String(badBytes, "UTF-8");
This works for the KEYS of getenv().
It doesn't seem to work for the VALUES of getenv().
I'm not really clear how the VALUES are encoded... the ending of the string seems to be missing the last character and/or contains a byte sequence of "-1", "-3" (I'm not familiar with what sequence would be)
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
ubuntu 14.04
mac
Linux kittingj-covdesktop-lnx 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Does not occur on windows
EXTRA RELEVANT SYSTEM CONFIGURATION :
ubuntu locale/LANG = en_US.UTF8
JVM is launched with -Dfile.encoding=UTF-16
A DESCRIPTION OF THE PROBLEM :
On linux/mac, but not windows... when the JVM is configured with -Dfile.encoding=UTF-16
(or UTF-16BE or UTF-16LE)
System.getenv() returns garbage strings.
They seems to be a combination of BOM/Byte Order Marker with UTF-8
They aren't usable java unicode strings .
There seems to be some kind of mistranslation happening between getting these values from the OS and Java.
The expectation here is that I should be able print these.
I don't see this problem under windows, but do see it on linux/mac.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Try printing strings that come from System.getenv() when...
OS = Linux/Mac, english UTF-8
JVM is launched with -Dfile.encoding=UTF-16
public class Main {
public static void main(String[] args) throws UnsupportedEncodingException {
Map<String, String> env = System.getenv();
String defaultCharsetName = Charset.defaultCharset().toString();
System.out.printf("The default charset = %s%n", defaultCharsetName);
for (Map.Entry<String, String> entry : env.entrySet()) {
System.out.printf("%s = %s%n", entry.getKey(), entry.getValue());
byte[] keyBytes = entry.getKey().getBytes();
byte[] valueBytes = entry.getValue().getBytes();
System.out.printf("%s%n", Arrays.toString(keyBytes));
System.out.printf("%s%n", Arrays.toString(valueBytes));
}
System.out.println("Done");
}
On Linux/Mac
copy the above code to a file -- Main.java
javac Main.java
java -Dfile.encoding=UTF-16 Main
Compare the environment variables with what you see in a terminal window
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I would expect that printing the values returned by System.getenv() would look equivalent to
looking at the env variables in a terminal window. (e.g. type set in a terminal window)
ACTUAL -
Observe that the env variables all print as garbage.
Also, the byte encodings aren't correct for UTF-16
For English characters, you would expect to see '00' in every other byte.
Instead, it looks like UTF-8 with a BOM.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
code compiles and runs; but displays incorrectly on Linux/Mac.
It does display correctly on Windows (Win 7 Pro - 64Bit/English)
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
public class Main {
public static void main(String[] args) throws UnsupportedEncodingException {
Map<String, String> env = System.getenv();
String defaultCharsetName = Charset.defaultCharset().toString();
System.out.printf("The default charset = %s%n", defaultCharsetName);
for (Map.Entry<String, String> entry : env.entrySet()) {
System.out.printf("%s = %s%n", entry.getKey(), entry.getValue());
byte[] keyBytes = entry.getKey().getBytes();
byte[] valueBytes = entry.getValue().getBytes();
System.out.printf("%s%n", Arrays.toString(keyBytes));
System.out.printf("%s%n", Arrays.toString(valueBytes));
}
System.out.println("Done");
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
There's isn't a good way to workaround this
The best I've come up with is...
getBytes() on the string from System.getenv()
remove the BOM / create a new byte array that doesn't contain a BOM
create a new string using these bytes and specifying a charset, e.g.
str = new String(badBytes, "UTF-8");
This works for the KEYS of getenv().
It doesn't seem to work for the VALUES of getenv().
I'm not really clear how the VALUES are encoded... the ending of the string seems to be missing the last character and/or contains a byte sequence of "-1", "-3" (I'm not familiar with what sequence would be)
- relates to
-
JDK-8285517 System.getenv() returns unexpected value if environment variable has non ASCII character
- Closed