-
Bug
-
Resolution: Unresolved
-
P4
-
None
-
8, 25
-
generic
-
windows
ADDITIONAL SYSTEM INFORMATION :
openjdk 25-ea 2025-09-16
OpenJDK Runtime Environment (build 25-ea+22-2667)
OpenJDK 64-Bit Server VM (build 25-ea+22-2667, mixed mode, sharing)
openjdk version "1.8.0_362"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_362-b09)
OpenJDK 64-Bit Server VM (Temurin)(build 25.362-b09, mixed mode)
C:\Pleiades\2023-03\java\11\bin\java -version
openjdk version "11.0.18" 2023-01-17
OpenJDK Runtime Environment Temurin-11.0.18+10 (build 11.0.18+10)
OpenJDK 64-Bit Server VM Temurin-11.0.18+10 (build 11.0.18+10, mixed mode)
openjdk version "17.0.6" 2023-01-17
OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10)
OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, sharing)
openjdk 21.0.3 2024-04-16 LTS
OpenJDK Runtime Environment Temurin-21.0.3+9 (build 21.0.3+9-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (build 21.0.3+9-LTS, mixed mode, sharing)
Windows 23H2
A DESCRIPTION OF THE PROBLEM :
Console terminals in Windows can change the I/O encoding from OEM one depending on the current OS language to UTF-8, but System.in in Java requires Windows Terminal or a non-UTF-8 console character set to yield accurate console input bytes.
In Windows + UTF-8 I/O character set + a terminal other than Windows Terminal (e.g. ConHost, VS Code, and Git Bash) that uses ConHost instead of OpenConsole (bundled in Windows Terminal), input bytes of non-ASCII characters are corrupted. If you turn on "Beta: Use Unicode UTF-8 for worldwide language support" mentioned in a comment in JDK-8356149, you will bump into this bug unless you stick to Windows Terminal.
A code unit appears to be garbled into one byte per one.
Workaround:
1. Stick to the legacy code page I/O encoding (including turning off "Beta: Use Unicode UTF-8 for worldwide language support")
2. Use Windows Terminal (but you won't want to leave your IDE or VS Code if possible)
3. Use a different OS
4. Use `System.console().readLine()` in JShell in the bleeding-edge EA build
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Launch a terminal other than Windows Terminal (e.g. VS Code, Git Bash, or ConHost) in Windows
2. Change the I/O charset to UTF-8 by `[Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.Encoding]::UTF-8` (PowerShell) or `chcp 65001` (others)
3. (Compile and) Run the test case code (`java path\to\TestCaseCode.java` or `javac path\to\TestCaseCode.java && java SystemInBug`)
4. Input string including some non-ASCII characters, and press Enter and Ctrl + Z
Note:
1. String in Expected Result is still corrupted in Java 8 (but Array is correct)
2. Bytes after corrupted look random (changed if you run once more)
3. Expected Result is taken in Windows Terminal, the only terminal where this bug doesn't occur in Windows
4. Git Bash (and other pseudo-Unix environment) requires `chcp.com` instead of `chcp`
5. Java 8 requires an additional option `-Dsun.stdout.encoding=UTF-8` to output non-ASCII characters correctly.
6. `&&` is not available in PowerShell 5, but available in Bash (e.g. Git Bash), PowerShell 7, and CMD.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Input: √2 ÷ 2 = 1/√2
あいうえおアイウエオ漢字
0123456789ABCDEF
^Z
Array: E2 88 9A 32 20 C3 B7 20 32 20 3D 20 31 2F E2 88 9A 32 0D 0A E3 81 82 E3 81 84 E3 81 86 E3 81 88 E3 81 8A E3 82 A2 E3 82 A4 E3 82 A6 E3 82 A8 E3 82 AA E6 BC A2 E5 AD 97 0D 0A EF BC 90 EF BC 91 EF BC 92 EF BC 93 EF BC 94 EF BC 95 EF BC 96 EF BC 97 EF BC 98 EF BC 99 EF BC A1 EF BC A2 EF BC A3 EF BC A4 EF BC A5 EF BC A6 0D 0A
String: √2 ÷ 2 = 1/√2
あいうえおアイウエオ漢字
0123456789ABCDEF
ACTUAL -
Input: √2 ÷ 2 = 1/√2
あいうえおアイウエオ漢字
0123456789ABCDEF
^Z
Array: 80 32 20 C3 20 32 20 3D 20 31 2F C3 32 0D 0A 50 50 50 50 50 50 50 50 50 50 50 50 0D 0A 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 0D 0A
String: �2 � 2 = 1/�2
PPPPPPPPPPPP
����������������
---------- BEGIN SOURCE ----------
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.ArrayList;
class SystemInBug {
public static void main(String[] args) {
// stdin.encoding is for Java 25
final String encodingName = System.getProperty("stdin.encoding",
// stdout.encoding is for Java 21, and sun.stdout.encoding is for Java 8, 11,
// and 17
System.getProperty("stdout.encoding", System.getProperty("sun.stdout.encoding", "UTF-8")));
// Java 8 has a bug that it doesn't recognize "cp65001" as an alias for "UTF-8"
final Charset encoding = Charset.forName(encodingName.equals("cp65001") ? "UTF-8" : encodingName);
System.err.print("Input: ");
try {
// Java 8 doesn't support System.in.readAllBytes()
final ArrayList<Byte> bytesList = new ArrayList<>();
int b;
while ((b = System.in.read()) != -1) {
bytesList.add((byte) b);
}
final byte[] bytes = new byte[bytesList.size()];
for (int i = 0; i < bytesList.size(); i++) {
bytes[i] = bytesList.get(i);
}
final String str = new String(bytes, encoding);
final String[] hexBytes = new String[bytes.length];
for (int i = 0; i < bytes.length; i++) {
hexBytes[i] = String.format("%02X", bytes[i]);
}
System.out.println("Array: " + String.join(" ", String.join(" ", hexBytes)));
System.out.println("String: " + str);
} catch (IOException e) {
e.printStackTrace(System.err);
System.exit(1);
}
}
}
---------- END SOURCE ----------
openjdk 25-ea 2025-09-16
OpenJDK Runtime Environment (build 25-ea+22-2667)
OpenJDK 64-Bit Server VM (build 25-ea+22-2667, mixed mode, sharing)
openjdk version "1.8.0_362"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_362-b09)
OpenJDK 64-Bit Server VM (Temurin)(build 25.362-b09, mixed mode)
C:\Pleiades\2023-03\java\11\bin\java -version
openjdk version "11.0.18" 2023-01-17
OpenJDK Runtime Environment Temurin-11.0.18+10 (build 11.0.18+10)
OpenJDK 64-Bit Server VM Temurin-11.0.18+10 (build 11.0.18+10, mixed mode)
openjdk version "17.0.6" 2023-01-17
OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10)
OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, sharing)
openjdk 21.0.3 2024-04-16 LTS
OpenJDK Runtime Environment Temurin-21.0.3+9 (build 21.0.3+9-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (build 21.0.3+9-LTS, mixed mode, sharing)
Windows 23H2
A DESCRIPTION OF THE PROBLEM :
Console terminals in Windows can change the I/O encoding from OEM one depending on the current OS language to UTF-8, but System.in in Java requires Windows Terminal or a non-UTF-8 console character set to yield accurate console input bytes.
In Windows + UTF-8 I/O character set + a terminal other than Windows Terminal (e.g. ConHost, VS Code, and Git Bash) that uses ConHost instead of OpenConsole (bundled in Windows Terminal), input bytes of non-ASCII characters are corrupted. If you turn on "Beta: Use Unicode UTF-8 for worldwide language support" mentioned in a comment in JDK-8356149, you will bump into this bug unless you stick to Windows Terminal.
A code unit appears to be garbled into one byte per one.
Workaround:
1. Stick to the legacy code page I/O encoding (including turning off "Beta: Use Unicode UTF-8 for worldwide language support")
2. Use Windows Terminal (but you won't want to leave your IDE or VS Code if possible)
3. Use a different OS
4. Use `System.console().readLine()` in JShell in the bleeding-edge EA build
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Launch a terminal other than Windows Terminal (e.g. VS Code, Git Bash, or ConHost) in Windows
2. Change the I/O charset to UTF-8 by `[Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.Encoding]::UTF-8` (PowerShell) or `chcp 65001` (others)
3. (Compile and) Run the test case code (`java path\to\TestCaseCode.java` or `javac path\to\TestCaseCode.java && java SystemInBug`)
4. Input string including some non-ASCII characters, and press Enter and Ctrl + Z
Note:
1. String in Expected Result is still corrupted in Java 8 (but Array is correct)
2. Bytes after corrupted look random (changed if you run once more)
3. Expected Result is taken in Windows Terminal, the only terminal where this bug doesn't occur in Windows
4. Git Bash (and other pseudo-Unix environment) requires `chcp.com` instead of `chcp`
5. Java 8 requires an additional option `-Dsun.stdout.encoding=UTF-8` to output non-ASCII characters correctly.
6. `&&` is not available in PowerShell 5, but available in Bash (e.g. Git Bash), PowerShell 7, and CMD.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Input: √2 ÷ 2 = 1/√2
あいうえおアイウエオ漢字
0123456789ABCDEF
^Z
Array: E2 88 9A 32 20 C3 B7 20 32 20 3D 20 31 2F E2 88 9A 32 0D 0A E3 81 82 E3 81 84 E3 81 86 E3 81 88 E3 81 8A E3 82 A2 E3 82 A4 E3 82 A6 E3 82 A8 E3 82 AA E6 BC A2 E5 AD 97 0D 0A EF BC 90 EF BC 91 EF BC 92 EF BC 93 EF BC 94 EF BC 95 EF BC 96 EF BC 97 EF BC 98 EF BC 99 EF BC A1 EF BC A2 EF BC A3 EF BC A4 EF BC A5 EF BC A6 0D 0A
String: √2 ÷ 2 = 1/√2
あいうえおアイウエオ漢字
0123456789ABCDEF
ACTUAL -
Input: √2 ÷ 2 = 1/√2
あいうえおアイウエオ漢字
0123456789ABCDEF
^Z
Array: 80 32 20 C3 20 32 20 3D 20 31 2F C3 32 0D 0A 50 50 50 50 50 50 50 50 50 50 50 50 0D 0A 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 0D 0A
String: �2 � 2 = 1/�2
PPPPPPPPPPPP
����������������
---------- BEGIN SOURCE ----------
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.ArrayList;
class SystemInBug {
public static void main(String[] args) {
// stdin.encoding is for Java 25
final String encodingName = System.getProperty("stdin.encoding",
// stdout.encoding is for Java 21, and sun.stdout.encoding is for Java 8, 11,
// and 17
System.getProperty("stdout.encoding", System.getProperty("sun.stdout.encoding", "UTF-8")));
// Java 8 has a bug that it doesn't recognize "cp65001" as an alias for "UTF-8"
final Charset encoding = Charset.forName(encodingName.equals("cp65001") ? "UTF-8" : encodingName);
System.err.print("Input: ");
try {
// Java 8 doesn't support System.in.readAllBytes()
final ArrayList<Byte> bytesList = new ArrayList<>();
int b;
while ((b = System.in.read()) != -1) {
bytesList.add((byte) b);
}
final byte[] bytes = new byte[bytesList.size()];
for (int i = 0; i < bytesList.size(); i++) {
bytes[i] = bytesList.get(i);
}
final String str = new String(bytes, encoding);
final String[] hexBytes = new String[bytes.length];
for (int i = 0; i < bytes.length; i++) {
hexBytes[i] = String.format("%02X", bytes[i]);
}
System.out.println("Array: " + String.join(" ", String.join(" ", hexBytes)));
System.out.println("String: " + str);
} catch (IOException e) {
e.printStackTrace(System.err);
System.exit(1);
}
}
}
---------- END SOURCE ----------