Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8356165

System.in in jshell replace supplementary characters with ??

XMLWordPrintable

    • b17
    • 21
    • generic
    • generic

      ADDITIONAL SYSTEM INFORMATION :
      openjdk 25-ea 2025-09-16
      OpenJDK Runtime Environment (build 25-ea+21-2530)
      OpenJDK 64-Bit Server VM (build 25-ea+21-2530, mixed mode, sharing)

      Windows 11 24H2

      -----

      openjdk 24 2025-03-18
      OpenJDK Runtime Environment Temurin-24+36 (build 24+36)
      OpenJDK 64-Bit Server VM Temurin-24+36 (build 24+36, mixed mode, sharing)

      Ubuntu 24.04 WSL

      A DESCRIPTION OF THE PROBLEM :
      System.in in JShell has a bug that it replace a supplementary character (an Unicode character out of BMP) with 2 `?` (ASCII question mark) read from the UTF-8 terminal.
      In the following result, 63 means the ASCII code of `?`.
      It looks like all BMP characters are kept as are. e.g. "¥" in the following result, and "あ1" (あ = 3 bytes in UTF-8).
      Possibly each surrogate code unit in the surrogate pair of the supplementary character may be tried to be converted to UTF-8.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Run `chcp 65001` (CMD) or `[Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.Encoding]::UTF8` (PowerShell) in advance in Windows
      2. Launch `jshell`
      3. Type `new String(System.in.readNBytes(4))` and press Enter to run
      4. Input `👍11` and press Enter

      Note: 👍 can be replaced with another supplementary character. 11 can be replaced with 2 other ASCII characters, or any one character encoded to 2 bytes in UTF-8 (e.g. ¥ or π).

      5. Do 3. and 4. once more with `System.in.readNBytes(4)`

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      jshell> new String(System.in.readNBytes(4))
      👍11
      $1 ==> "👍"

      jshell> System.in.readNBytes(4)
      👍11
      $2 ==> byte[4] { -16, -97, -111, -115 }

      jshell> new String(System.in.readNBytes(4))
      👍¥
      $3 ==> "👍"

      jshell> System.in.readNBytes(4)
      👍¥
      $4 ==> byte[4] { -16, -97, -111, -115 }
      ACTUAL -
      jshell> new String(System.in.readNBytes(4))
      👍11
      $1 ==> "??11"

      jshell> System.in.readNBytes(4)
      👍11
      $2 ==> byte[4] { 63, 63, 49, 49 }

      jshell> new String(System.in.readNBytes(4))
      👍¥
      $3 ==> "??¥"

      jshell> System.in.readNBytes(4)
      👍¥
      $4 ==> byte[4] { 63, 63, -62, -91 }

      ---------- BEGIN SOURCE ----------
      // JShell only.
      ---------- END SOURCE ----------

            jlahoda Jan Lahoda
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: