Loading...

Type: CSR
Resolution: Unresolved
Priority: P3
Fix Version/s: None
Component/s: core-libs
Labels:
None

Subcomponent:
java.lang
Compatibility Kind:

source
Compatibility Risk:
minimal
Compatibility Risk Description:
Little risk for a new method.
Interface Kind:

Java API
Scope:
SE

Summary

Introduce a new method String::encodeEscapes which replaces non-ASCII and non-printing characters in the string with escape sequences or unicode escapes.

Problem

If a string contains non-printing or non-ASCII characters then content may be unparsable, unreadable or unsecure when:

part of source for a programming language
application input
debug reporting

There is also the serious issue of when raw input from an external source is injected into vulnerability sensitive source (Ex. SQL).

Solution

Convert the string content to use escape sequences for non-ASCII and non-printing characters. This would allow developers to construct parsable/readable strings.

Note that this method the near reciprocal of String::translateEscapes. That is string.encodeEscapes().translateEscapes().equals(string) will always be true. However, string.translateEscapes().encodeEscapes().equals(string) will not always be true since the source string can vary some characters as escaped or non-escaped.

Specification

Update translateEscapes to reference encodeEscapes:

diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java
index 67ee641fba1..ab4ebcfb4b9 100644
--- a/src/java.base/share/classes/java/lang/String.java
+++ b/src/java.base/share/classes/java/lang/String.java
@@ -4238,6 +4238,8 @@ private static int outdent(List<String> lines) {
      * @jls 3.10.7 Escape Sequences
      * @jls 3.3 Unicode Escapes
      *
+     * @see String#encodeEscapes()
+     *
      * @since 15
      */
     public String translateEscapes() {
@@ -4324,6 +4326,77 @@ public String translateEscapes() {
         return new String(chars, 0, to);
     }

encodeEscapes:

+    /**
+     * Translate characters to their escaped equivalents, if necessary, such that the
+     * resulting string, when embedded in double quotes, can be parsed by the compiler
+     * to reproduce this string.
+     * <p>
+     * Characters are translated as follows;
+     * <table class="striped">
+     *   <caption style="display:none">Translation</caption>
+     *   <thead>
+     *   <tr>
+     *     <th scope="col">Character</th>
+      *     <th scope="col">Name</th>
+     *     <th scope="col">Translation</th>
+     *   </tr>
+     *   </thead>
+     *   <tbody>
+     *   <tr>
+     *     <th scope="row">{@code U+0008}
+     *     <td>backspace {@code (\u005Cb)}</td>
+     *     <td>{@code \u005C\u005Cb}</td>
+     *   </tr>
+     *   <tr>
+     *     <th scope="row">{@code U+0009}</th>
+     *     <td>horizontal tab {@code (\u005Ct)}</td>
+     *     <td>{@code \u005C\u005Ct}</td>
+     *   </tr>
+     *   <tr>
+     *     <th scope="row">{@code U+000A}</th>
+     *     <td>line feed {@code (\u005Cn)}</td>
+     *     <td>{@code \u005C\u005Cn}</td>
+     *   </tr>
+     *   <tr>
+     *     <th scope="row">{@code U+000C}</th>
+     *     <td>form feed {@code (\u005Cf)}</td>
+     *     <td>{@code \u005C\u005Cf}</td>
+     *   </tr>
+     *   <tr>
+     *     <th scope="row">{@code U+000D}</th>
+     *     <td>carriage return {@code (\u005Cr)}</td>
+     *     <td>{@code \u005C\u005Cr}</td>
+     *   </tr>
+     *   <tr>
+     *     <th scope="row">{@code U+0022}</th>
+     *     <td>double quote {@code (\u005C")}</td>
+     *     <td>{@code \u005C\u005C"}</td>
+     *   </tr>
+     *   <tr>
+     *     <th scope="row">{@code U+0027}</th>
+     *     <td>single quote {@code (\u005C')}</td>
+     *     <td>{@code \u005C\u005C'}</td>
+     *   </tr>
+     *   <tr>
+     *     <th scope="row">{@code U+005C}</th>
+     *     <td>backslash {@code (\u005C\u005C)}</td>
+     *     <td>{@code \u005C\u005C\u005C\u005C}</td>
+     *   </tr>
+     *   <tr>
+     *     <th scope="row">{@code U+0020-U+007E}</th>
+     *     <td>visible ASCII characters, excluding<p>double quote, single quote and backslash</td>
+     *     <td>as-is</td>
+     *   </tr>
+     *   <tr>
+     *     <th scope="row">{@code U+XXXX}</th>
+     *     <td>all other characters</td>
+     *     <td>\u005C\u005CuXXXX</td>
+     *   </tr>
+     *   </tbody>
+     * </table>
+     * Example:
+     * {@snippet lang=JAVA :
+     * String encoded = "\u2022This is a line.\n\t followed by this line.".encodeEscapes();
+     * System.out.println(encoded.equals("\\u2022This is a line.\\n\\t followed by this line."
+     * }
+     * will print out {@code true}.
+     * <p>
+     * The result of this method is lossless and as such the original string can always be
+     * reproduced using {@link String#translateEscapes()}. Also because of losslessness,
+     * specific characters that don't require escaping can be reverted by simply using a
+     * replace method. For example: if newlines need to be maintained in the resulting
+     * string then just apply {@code replace("\\n", "\n")} to the result.
+     *
+     * @return string with characters encoded using escape sequences and Unicode escapes
+     *
+     * @since 23
 +     *
+     * @see #translateEscapes()
+     *
+     * @jls 3.10.7 Escape Sequences
+     * @jls 3.3 Unicode Escapes
+     *
+     * @implNote If no characters were translated then the original string will be returned.
+     *
+     * @implSpec {@code string.encodeEscapes().translateEscapes().equals(string)} will
+     * always be {@code true}. However, this method is not the reciprocal to
+     * {@link #translateEscapes()} as any string yielded by
+     * {@link #translateEscapes()} may have many variations of original string.
+     */
+    public String encodeEscapes() {
+        int length = length();
+        StringBuilder sb = new StringBuilder(length + (length >> 2));
+        for (int i = 0; i < length; i++) {
+            char ch = charAt(i);
+            switch (ch) {
+                case '\b': sb.append('\\'); sb.append('b'); break;
+                case '\t': sb.append('\\'); sb.append('t'); break;
+                case '\n': sb.append('\\'); sb.append('n'); break;
+                case '\f': sb.append('\\'); sb.append('f'); break;
+                case '\r': sb.append('\\'); sb.append('r'); break;
+                case '\"': sb.append('\\'); sb.append('\"'); break;
+                case '\'': sb.append('\\'); sb.append('\''); break;
+                case '\\': sb.append('\\'); sb.append('\\'); break;
+                default: if (' ' <= ch && ch <= '~') {
+                    sb.append(ch);
+                } else {
+                    String hex = Integer.toHexString(ch);
+                    sb.append('\\');
+                    sb.append('u');
+                    sb.repeat('0', 4 - hex.length());
+                    sb.append(hex);
+                }
+            }
+        }
+        return sb.length() != length ? sb.toString() : this;
+    }
+

csr of

JDK-8253438 Add String::encodeEscapes

Open

Details

Description

Summary

Problem

Solution

Specification

Attachments

Issue Links

Activity

People

Dates