http://stackoverflow.com/questions/11927518/java-unicode-utf-8-and-windows-command-prompt
As well as this code:
=====
class Main {
public static void main(String[] args) throws Exception {
for (int i = 0; i < args.length; ++i) {
if (i > 0) {
System.out.print(' ');
}
System.out.print(args[i]);
}
System.out.println();
}
}
=====
Create a batch file with the following text and with UTF-8 encoding without BOM. Now execute the batch file using CLI. ‘f.txt’ does not contain the same characters as the input characters.
=========
chcp 65001
java Main 﨨狝 﨨狝 > f.txt
==========
A good start on language issues in the windows console is in this post and elsewhere in this blog: http://www.siao2.com/2010/10/07/10072032.aspx
There are multiple areas involved in this problem.
First is how the command arguments are passed to an app. Powershell appears to pass them differently than cmd.exe. With cmd.exe after calling chcp 65001, I see that the args are kept in wchar_t as ucs2. With powershell [Console].OutputEncoding as 437, 1252 and utf8 they appeared to be in char as utf8 encoding.
NOTE: Chcp is a commandline tool to call SetConsoleOutputCP(). As far as I can see a process should not call SetConsoleOutputCP
The second is how the command arguments are retrieved by an app
int main(int argc, char**argv)
vs
int wmain(int argc, wchar_t**argv)
vs
char* GetCommandLineA()
vs
wchar_t* GetCommandLineW()
The JDK uses GetCommandLineA and should use GetCommandLineW to support Unicode args. This change should be controlled by the java commandline to ensure compatibility.
Second are the output streams (stdout, stderr) – These are involved when using > or | to put the results in a file and when writing to the console. This turns out to involve complex logic around using WriteConsoleW for console output, WriteFile for > and | with a final fallback to writing ascii in the GetConsoleOutputCP().
Third is getting the consoles to display gyphs for the Unicode characters being tested. The font selected in the cmd and powershell windows must be Lucida Console or Consolas. Also, additional language packs must be installed to get fallback fonts for the characters needed. Finally, using a console app (conemu, Console+, ..) should enable the proper display of Unicode glyphs for cmd and powershell windows that they start. PowerShellISE worked when the [Console].OutputEncoding is set to utf8.
- relates to
-
JDK-4488646 Java executable and System properties need to support Unicode on Windows
- Open
-
JDK-8272352 Java launcher can not parse Chinese character when system locale is set to UTF-8
- Resolved
-
JDK-6584897 Cannot invoke class from command line with args containing non-ASCII characters
- Closed
-
JDK-6727466 java.exe/JRE1.6.0_10-b25/b27 doesn't seem to handle international characters
- Closed
-
JDK-8029584 Allow \uxxxx unicode-escaping on the jvm command-line arguments
- Open
-
JDK-4900150 Use Unicode API on in the area other than AWT
- Open
-
JDK-6589705 JDK should provide support for charset detection.
- Closed