Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8355357

Add standard system property stdin.encoding

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 25
    • core-libs
    • None
    • behavioral
    • minimal
    • It is possible that the property name `stdin.encoding` collides with a property already in use by an application. This seems fairly unlikely though. A SourceGraph code search revealed no hits for the string `stdin.encoding`.
    • System or security property
    • SE

      Summary

      Add and specify a new stdin.encoding system property that recommends an encoding that applications should use when reading character data from the standard input.

      Problem

      The JDK is missing a means to recommend a character encoding that applications and libraries should use when reading from the standard input. The existing encoding properties are insufficient for this. The file.encoding property is the default charset and is usually UTF-8 per JEP 400. The native.encoding property specifies the user's or system's preferred encoding. However, when standard input is connected to a console, it is possible that the console has an encoding that differs from both of these. The current recommendation (introduced in JDK 17 by JDK-8264209) is for applications to use Console.charset(). However, this is correct only when the standard input is connected to a console and a Console object is available. It is unclear what applications are expected to do in other cases.

      A new property is thus warranted that provides an encoding recommendation for whatever the standard input is connected to. In addition, the forthcoming JEP 512 proposes a new method IO.readln which reads from standard input. This method needs a standardized way to establish what encoding it should use.

      Solution

      Create and specify a new system property stdin.encoding. It is broadly similar to the existing stdout.encoding and stderr.encoding properties. It is set by platform-specific code to a value that indicates the encoding that is appropriate to use for whatever the standard input is connected to. The exact behavior is platform-specific.

      It is possible to set the property to UTF-8 on the command line to override the system's chosen value. The result of using other values is unspecified. Other property specifications' wording is adjusted to use "unspecified" instead of "undefined" to bring them into alignment.

      The specification for System.in is modified to include a recommendation for applications to use the stdin.encoding property to determine the encoding to use. Additional specification text warns against mixing byte input with character input.

      Clarifies that that native.encoding system property cannot be overridden on the command line.

      Specification

      The specification for System.in is modified as follows:

           /**
            * The "standard" input stream. This stream is already
      -     * open and ready to supply input data. Typically this stream
      +     * open and ready to supply input data. This stream
            * corresponds to keyboard input or another input source specified by
      -     * the host environment or user. In case this stream is wrapped
      -     * in a {@link java.io.InputStreamReader}, {@link Console#charset()}
      -     * should be used for the charset, or consider using
      -     * {@link Console#reader()}.
      +     * the host environment or user. Applications should use the encoding
      +     * specified by the {@link ##stdin.encoding stdin.encoding} property
      +     * to convert input bytes to character data.
            *
      -     * @see Console#charset()
      -     * @see Console#reader()
      +     * @apiNote
      +     * The typical approach to read character data is to wrap {@code System.in}
      +     * within an {@link java.io.InputStreamReader InputStreamReader} or other object
      +     * that handles character encoding. After this is done, subsequent reading should
      +     * use only the wrapper object; operating directly on {@code System.in} results
      +     * in unspecified behavior.
      +     * <p>
      +     * For handling interactive input, consider using {@link Console}.
      +     *
      +     * @see Console
      +     * @see ##stdin.encoding stdin.encoding
            */

      The table of system properties in the System.getProperties method specification is modified as follows:

            * <tr><th scope="row">{@systemProperty user.dir}</th>
            *     <td>User's current working directory</td></tr>
            * <tr><th scope="row">{@systemProperty native.encoding}</th>
      -     *     <td>Character encoding name derived from the host environment and/or
      -     *     the user's settings. Setting this system property has no effect.</td></tr>
      +     *     <td>Character encoding name derived from the host environment and
      +     *     the user's settings. Setting this system property on the command line
      +     *     has no effect.</td></tr>
      +     * <tr><th scope="row">{@systemProperty stdin.encoding}</th>
      +     *     <td>Character encoding name for {@link System#in System.in}.
      +     *     The Java runtime can be started with the system property set to {@code UTF-8}.
      +     *     Starting it with the property set to another value results in unspecified behavior.
            * <tr><th scope="row">{@systemProperty stdout.encoding}</th>
            *     <td>Character encoding name for {@link System#out System.out} and
            *     {@link System#console() System.console()}.
      -     *     The Java runtime can be started with the system property set to {@code UTF-8},
      -     *     starting it with the property set to another value leads to undefined behavior.
      +     *     The Java runtime can be started with the system property set to {@code UTF-8}.
      +     *     Starting it with the property set to another value results in unspecified behavior.
            * <tr><th scope="row">{@systemProperty stderr.encoding}</th>
            *     <td>Character encoding name for {@link System#err System.err}.
      -     *     The Java runtime can be started with the system property set to {@code UTF-8},
      -     *     starting it with the property set to another value leads to undefined behavior.
      +     *     The Java runtime can be started with the system property set to {@code UTF-8}.
      +     *     Starting it with the property set to another value results in unspecified behavior.
            * </tbody>
            * </table>
            * <p>

      And further down in the same table:

            * <tr><th scope="row">{@systemProperty file.encoding}</th>
            *     <td>The name of the default charset, defaults to {@code UTF-8}.
            *     The property may be set on the command line to the value
            *     {@code UTF-8} or {@code COMPAT}. If set on the command line to
            *     the value {@code COMPAT} then the value is replaced with the
            *     value of the {@code native.encoding} property during startup.
            *     Setting the property to a value other than {@code UTF-8} or
      -     *     {@code COMPAT} leads to unspecified behavior.
      +     *     {@code COMPAT} results in unspecified behavior.
            *     </td></tr>
            * </tbody>
            * </table>

            smarks Stuart Marks
            smarks Stuart Marks
            Alan Bateman, Naoto Sato
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: