ADDITIONAL SYSTEM INFORMATION :
openjdk 25 2025-09-16 LTS
OpenJDK Runtime Environment Temurin-25+36 (build 25+36-LTS)
OpenJDK 64-Bit Server VM Temurin-25+36 (build 25+36-LTS, mixed mode, sharing)
openjdk 26-ea 2026-03-17
OpenJDK Runtime Environment (build 26-ea+17-1764)
OpenJDK 64-Bit Server VM (build 26-ea+17-1764, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
java.exe (and possibly javaw.exe) cannot handle Unicode characters that are out of the ANSI code page. src/java.base/share/native/launcher/main.c gets args in UTF-16 by GetCommandLineW in Windows but converts them to ANSI code page, not UTF-8. This corrupts such characters into ? (?? for a supplementary character).
๐ถ๏ธ, ๐ฐป, and ๐ in the following are supplementary characters and consume 2 Java chars for each.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Save the source code as Args.java
2. chcp 65001 (CMD) or [Console]::OutputEncoding = [Console]::InputEncoding = [System.Text.Encoding]::UTF8 (PowerShell)
3. java Args.java ๐ถ๏ธ๐ฐป๐ฐป้บบ๐ ½÷¼×โ =โ
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
[0]: ๐ถ๏ธ๐ฐป๐ฐป้บบ๐
[1]: ½÷¼×โ =โ
ACTUAL -
Japanese locale:
[0]: ??????้บบ??
[1]: ?÷?×?=?
US English locale:
[0]: ?????????
[1]: ½÷¼×?=?
---------- BEGIN SOURCE ----------
// Args.java
// If you test this in prior to Java 24, add --enable-preview or convert this to the traditional class form
void main(String args[]) {
for (int i = 0; i < args.length; i++) {
IO.println("[" + i + "]: " + args[i]);
}
}
---------- END SOURCE ----------
openjdk 25 2025-09-16 LTS
OpenJDK Runtime Environment Temurin-25+36 (build 25+36-LTS)
OpenJDK 64-Bit Server VM Temurin-25+36 (build 25+36-LTS, mixed mode, sharing)
openjdk 26-ea 2026-03-17
OpenJDK Runtime Environment (build 26-ea+17-1764)
OpenJDK 64-Bit Server VM (build 26-ea+17-1764, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
java.exe (and possibly javaw.exe) cannot handle Unicode characters that are out of the ANSI code page. src/java.base/share/native/launcher/main.c gets args in UTF-16 by GetCommandLineW in Windows but converts them to ANSI code page, not UTF-8. This corrupts such characters into ? (?? for a supplementary character).
๐ถ๏ธ, ๐ฐป, and ๐ in the following are supplementary characters and consume 2 Java chars for each.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Save the source code as Args.java
2. chcp 65001 (CMD) or [Console]::OutputEncoding = [Console]::InputEncoding = [System.Text.Encoding]::UTF8 (PowerShell)
3. java Args.java ๐ถ๏ธ๐ฐป๐ฐป้บบ๐ ½÷¼×โ =โ
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
[0]: ๐ถ๏ธ๐ฐป๐ฐป้บบ๐
[1]: ½÷¼×โ =โ
ACTUAL -
Japanese locale:
[0]: ??????้บบ??
[1]: ?÷?×?=?
US English locale:
[0]: ?????????
[1]: ½÷¼×?=?
---------- BEGIN SOURCE ----------
// Args.java
// If you test this in prior to Java 24, add --enable-preview or convert this to the traditional class form
void main(String args[]) {
for (int i = 0; i < args.length; i++) {
IO.println("[" + i + "]: " + args[i]);
}
}
---------- END SOURCE ----------