-
CSR
-
Resolution: Approved
-
P4
-
None
-
behavioral
-
minimal
-
It is possible that the property name `stdin.encoding` collides with a property already in use by an application. This seems fairly unlikely though. A SourceGraph code search revealed no hits for the string `stdin.encoding`.
-
System or security property
-
SE
Summary
Add and specify a new stdin.encoding
system property that
recommends an encoding that applications should use when
reading character data from the standard input.
Problem
The JDK is missing a means to recommend a character encoding
that applications and libraries should use when reading from the
standard input. The existing encoding properties are insufficient
for this. The file.encoding
property is the default charset and is
usually UTF-8 per JEP 400. The native.encoding
property
specifies the user's or system's preferred encoding. However, when
standard input is connected to a console, it is possible that the
console has an encoding that differs from both of these. The
current recommendation (introduced in JDK 17 by JDK-8264209)
is for applications to use Console.charset()
. However, this is
correct only when the standard input is connected to a console
and a Console
object is available. It is unclear what applications
are expected to do in other cases.
A new property is thus warranted that provides an encoding
recommendation for whatever the standard input is connected
to. In addition, the forthcoming JEP 512 proposes a new
method IO.readln
which reads from standard input. This method
needs a standardized way to establish what encoding it should use.
Solution
Create and specify a new system property stdin.encoding
. It is broadly
similar to the existing stdout.encoding
and stderr.encoding
properties.
It is set by platform-specific code to a value that indicates the encoding
that is appropriate to use for whatever the standard input is connected to.
The exact behavior is platform-specific.
It is possible to set the property to UTF-8 on the command line to override the system's chosen value. The result of using other values is unspecified. Other property specifications' wording is adjusted to use "unspecified" instead of "undefined" to bring them into alignment.
The specification for System.in
is modified to include a recommendation
for applications to use the stdin.encoding
property to determine the encoding
to use. Additional specification text warns against mixing byte input with
character input.
Clarifies that that native.encoding
system property cannot be overridden
on the command line.
Specification
The specification for System.in
is modified as follows:
/**
* The "standard" input stream. This stream is already
- * open and ready to supply input data. Typically this stream
+ * open and ready to supply input data. This stream
* corresponds to keyboard input or another input source specified by
- * the host environment or user. In case this stream is wrapped
- * in a {@link java.io.InputStreamReader}, {@link Console#charset()}
- * should be used for the charset, or consider using
- * {@link Console#reader()}.
+ * the host environment or user. Applications should use the encoding
+ * specified by the {@link ##stdin.encoding stdin.encoding} property
+ * to convert input bytes to character data.
*
- * @see Console#charset()
- * @see Console#reader()
+ * @apiNote
+ * The typical approach to read character data is to wrap {@code System.in}
+ * within an {@link java.io.InputStreamReader InputStreamReader} or other object
+ * that handles character encoding. After this is done, subsequent reading should
+ * use only the wrapper object; operating directly on {@code System.in} results
+ * in unspecified behavior.
+ * <p>
+ * For handling interactive input, consider using {@link Console}.
+ *
+ * @see Console
+ * @see ##stdin.encoding stdin.encoding
*/
The table of system properties in the System.getProperties
method specification is modified
as follows:
* <tr><th scope="row">{@systemProperty user.dir}</th>
* <td>User's current working directory</td></tr>
* <tr><th scope="row">{@systemProperty native.encoding}</th>
- * <td>Character encoding name derived from the host environment and/or
- * the user's settings. Setting this system property has no effect.</td></tr>
+ * <td>Character encoding name derived from the host environment and
+ * the user's settings. Setting this system property on the command line
+ * has no effect.</td></tr>
+ * <tr><th scope="row">{@systemProperty stdin.encoding}</th>
+ * <td>Character encoding name for {@link System#in System.in}.
+ * The Java runtime can be started with the system property set to {@code UTF-8}.
+ * Starting it with the property set to another value results in unspecified behavior.
* <tr><th scope="row">{@systemProperty stdout.encoding}</th>
* <td>Character encoding name for {@link System#out System.out} and
* {@link System#console() System.console()}.
- * The Java runtime can be started with the system property set to {@code UTF-8},
- * starting it with the property set to another value leads to undefined behavior.
+ * The Java runtime can be started with the system property set to {@code UTF-8}.
+ * Starting it with the property set to another value results in unspecified behavior.
* <tr><th scope="row">{@systemProperty stderr.encoding}</th>
* <td>Character encoding name for {@link System#err System.err}.
- * The Java runtime can be started with the system property set to {@code UTF-8},
- * starting it with the property set to another value leads to undefined behavior.
+ * The Java runtime can be started with the system property set to {@code UTF-8}.
+ * Starting it with the property set to another value results in unspecified behavior.
* </tbody>
* </table>
* <p>
And further down in the same table:
* <tr><th scope="row">{@systemProperty file.encoding}</th>
* <td>The name of the default charset, defaults to {@code UTF-8}.
* The property may be set on the command line to the value
* {@code UTF-8} or {@code COMPAT}. If set on the command line to
* the value {@code COMPAT} then the value is replaced with the
* value of the {@code native.encoding} property during startup.
* Setting the property to a value other than {@code UTF-8} or
- * {@code COMPAT} leads to unspecified behavior.
+ * {@code COMPAT} results in unspecified behavior.
* </td></tr>
* </tbody>
* </table>
- csr of
-
JDK-8350703 Add standard system property stdin.encoding
-
- Resolved
-