-
Bug
-
Resolution: Unresolved
-
P3
-
21, 25
ADDITIONAL SYSTEM INFORMATION :
openjdk 25-ea 2025-09-16
OpenJDK Runtime Environment (build 25-ea+21-2530)
OpenJDK 64-Bit Server VM (build 25-ea+21-2530, mixed mode, sharing)
Windows 11 24H2
-----
openjdk 24 2025-03-18
OpenJDK Runtime Environment Temurin-24+36 (build 24+36)
OpenJDK 64-Bit Server VM Temurin-24+36 (build 24+36, mixed mode, sharing)
Ubuntu 24.04 WSL
A DESCRIPTION OF THE PROBLEM :
System.in in JShell has a bug that it replace a supplementary character (an Unicode character out of BMP) with 2 `?` (ASCII question mark) read from the UTF-8 terminal.
In the following result, 63 means the ASCII code of `?`.
It looks like all BMP characters are kept as are. e.g. "¥" in the following result, and "あ1" (あ = 3 bytes in UTF-8).
Possibly each surrogate code unit in the surrogate pair of the supplementary character may be tried to be converted to UTF-8.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Run `chcp 65001` (CMD) or `[Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.Encoding]::UTF8` (PowerShell) in advance in Windows
2. Launch `jshell`
3. Type `new String(System.in.readNBytes(4))` and press Enter to run
4. Input `👍11` and press Enter
Note: 👍 can be replaced with another supplementary character. 11 can be replaced with 2 other ASCII characters, or any one character encoded to 2 bytes in UTF-8 (e.g. ¥ or π).
5. Do 3. and 4. once more with `System.in.readNBytes(4)`
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
jshell> new String(System.in.readNBytes(4))
👍11
$1 ==> "👍"
jshell> System.in.readNBytes(4)
👍11
$2 ==> byte[4] { -16, -97, -111, -115 }
jshell> new String(System.in.readNBytes(4))
👍¥
$3 ==> "👍"
jshell> System.in.readNBytes(4)
👍¥
$4 ==> byte[4] { -16, -97, -111, -115 }
ACTUAL -
jshell> new String(System.in.readNBytes(4))
👍11
$1 ==> "??11"
jshell> System.in.readNBytes(4)
👍11
$2 ==> byte[4] { 63, 63, 49, 49 }
jshell> new String(System.in.readNBytes(4))
👍¥
$3 ==> "??¥"
jshell> System.in.readNBytes(4)
👍¥
$4 ==> byte[4] { 63, 63, -62, -91 }
---------- BEGIN SOURCE ----------
// JShell only.
---------- END SOURCE ----------
openjdk 25-ea 2025-09-16
OpenJDK Runtime Environment (build 25-ea+21-2530)
OpenJDK 64-Bit Server VM (build 25-ea+21-2530, mixed mode, sharing)
Windows 11 24H2
-----
openjdk 24 2025-03-18
OpenJDK Runtime Environment Temurin-24+36 (build 24+36)
OpenJDK 64-Bit Server VM Temurin-24+36 (build 24+36, mixed mode, sharing)
Ubuntu 24.04 WSL
A DESCRIPTION OF THE PROBLEM :
System.in in JShell has a bug that it replace a supplementary character (an Unicode character out of BMP) with 2 `?` (ASCII question mark) read from the UTF-8 terminal.
In the following result, 63 means the ASCII code of `?`.
It looks like all BMP characters are kept as are. e.g. "¥" in the following result, and "あ1" (あ = 3 bytes in UTF-8).
Possibly each surrogate code unit in the surrogate pair of the supplementary character may be tried to be converted to UTF-8.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Run `chcp 65001` (CMD) or `[Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.Encoding]::UTF8` (PowerShell) in advance in Windows
2. Launch `jshell`
3. Type `new String(System.in.readNBytes(4))` and press Enter to run
4. Input `👍11` and press Enter
Note: 👍 can be replaced with another supplementary character. 11 can be replaced with 2 other ASCII characters, or any one character encoded to 2 bytes in UTF-8 (e.g. ¥ or π).
5. Do 3. and 4. once more with `System.in.readNBytes(4)`
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
jshell> new String(System.in.readNBytes(4))
👍11
$1 ==> "👍"
jshell> System.in.readNBytes(4)
👍11
$2 ==> byte[4] { -16, -97, -111, -115 }
jshell> new String(System.in.readNBytes(4))
👍¥
$3 ==> "👍"
jshell> System.in.readNBytes(4)
👍¥
$4 ==> byte[4] { -16, -97, -111, -115 }
ACTUAL -
jshell> new String(System.in.readNBytes(4))
👍11
$1 ==> "??11"
jshell> System.in.readNBytes(4)
👍11
$2 ==> byte[4] { 63, 63, 49, 49 }
jshell> new String(System.in.readNBytes(4))
👍¥
$3 ==> "??¥"
jshell> System.in.readNBytes(4)
👍¥
$4 ==> byte[4] { 63, 63, -62, -91 }
---------- BEGIN SOURCE ----------
// JShell only.
---------- END SOURCE ----------
- links to
-
Review(master) openjdk/jdk/25079