Summary
Add two new escape sequences for string and character literals for managing explicit whitespace and carriage control.
Problem
In text blocks, newlines (U+000A
) are not typically declared explicitly using \n
. Instead, newlines are inserted implicitly wherever content breaks to the next line. What if an implicit newline is not desired?
For example, it is common practice to split very long string literals into concatenations of smaller substrings and then hard-wrap the resulting string literals over multiple lines of source code:
String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
"elit, sed do eiusmod tempor incididunt ut labore " +
"et dolore magna aliqua.";
This is exactly the form of complex string that text blocks express more readably:
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore
et dolore magna aliqua.
""";
However, using text blocks to represent long strings has a drawback: an implicit newline is inserted on every line. It would be helpful to be able to selectively denote which lines do not pick up the implicit newline.
Turning to another matter, the space (U+0020
) character's lack of observability creates a problem for strings.
For example, text blocks are missing per-line delimiters, like those found in string literals, that clearly indicate where the content of a line begins and where the content of a line ends. The lack of direct space-character observability is the primary reason for text blocks always stripping trailing white space. However, this behavior leads to a counter issue: How does a developer retain trailing white space in a text block?
For another example, various visual tricks are required to get an accurate count of multiple spaces any string literal. For instance, how many spaces are in the string literal " "
? How can a developer count what they can not visually discern?
Solution
Change the JLS section on "Escape Sequences for Character and String Literals" and the API String::translateEscapes
to recognize two new escape sequences:
\<line-terminator>
The escape sequences
\␊
(U+005C, U+000A
),\␍
(U+005C, U+000D
) and\␍␊
(U+005C, U+000D, U+000A
) represent line continuation. Unlike other escape sequences, these line continuation sequences are simply discarded during escape translation.Example;
String text = """ Lorem ipsum dolor sit amet, consectetur adipiscing \ elit, sed do eiusmod tempor incididunt ut labore \ et dolore magna aliqua.\ """;
After white space stripping, the above text block would have the value,
"Lorem ipsum dolor sit amet, consectetur adipiscing \␊elit, sed do eiusmod tempor incididunt ut labore \␊et dolore magna aliqua.\␊"
. Applying escape translation would then yield"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
.\s
(U+005C, U+0073
)The escape sequence
\s
represents observable space and is translated to the ASCII space character (U+0020
).String str = "A\sline\swith\sspaces";
After translation the String
str
will have the value"A line with spaces"
.
Specification
JLS changes for the new escape sequences are found in section 3.10.7 of the attachment text-blocks-jls.html
. There are no JVMS changes.
String::translateEscapes diff
--- a/src/java.base/share/classes/java/lang/String.java 2019-11-12 13:32:43.000000000 -0400
+++ b/src/java.base/share/classes/java/lang/String.java 2019-11-12 13:32:02.000000000 -0400
@@ -3060,10 +3060,15 @@
* <th scope="row">{@code \u005Cr}</th>
* <td>carriage return</td>
* <td>{@code U+000D}</td>
* </tr>
* <tr>
+ * <th scope="row">{@code \u005Cs}</th>
+ * <td>space</td>
+ * <td>{@code U+0020}</td>
+ * </tr>
+ * <tr>
* <th scope="row">{@code \u005C"}</th>
* <td>double quote</td>
* <td>{@code U+0022}</td>
* </tr>
* <tr>
@@ -3079,10 +3084,15 @@
* <tr>
* <th scope="row">{@code \u005C0 - \u005C377}</th>
* <td>octal escape</td>
* <td>code point equivalents</td>
* </tr>
+ * <tr>
+ * <th scope="row">{@code \u005C<line-terminator>}</th>
+ * <td>continuation</td>
+ * <td>discard</td>
+ * </tr>
* </tbody>
* </table>
*
* @implNote
* This method does <em>not</em> translate Unicode escapes such as "{@code \u005cu2022}".
String::translateEscapes after diff changes
/**
* {@preview Associated with text blocks, a preview feature of
* the Java language.
*
* This method is associated with <i>text blocks</i>, a preview
* feature of the Java language. Programs can only use this
* method when preview features are enabled. Preview features
* may be removed in a future release, or upgraded to permanent
* features of the Java language.}
*
* Returns a string whose value is this string, with escape sequences
* translated as if in a string literal.
* <p>
* Escape sequences are translated as follows;
* <table class="striped">
* <caption style="display:none">Translation</caption>
* <thead>
* <tr>
* <th scope="col">Escape</th>
* <th scope="col">Name</th>
* <th scope="col">Translation</th>
* </tr>
* </thead>
* <tbody>
* <tr>
* <th scope="row">{@code \u005Cb}</th>
* <td>backspace</td>
* <td>{@code U+0008}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005Ct}</th>
* <td>horizontal tab</td>
* <td>{@code U+0009}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005Cn}</th>
* <td>line feed</td>
* <td>{@code U+000A}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005Cf}</th>
* <td>form feed</td>
* <td>{@code U+000C}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005Cr}</th>
* <td>carriage return</td>
* <td>{@code U+000D}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005Cs}</th>
* <td>space</td>
* <td>{@code U+0020}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005C"}</th>
* <td>double quote</td>
* <td>{@code U+0022}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005C'}</th>
* <td>single quote</td>
* <td>{@code U+0027}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005C\u005C}</th>
* <td>backslash</td>
* <td>{@code U+005C}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005C0 - \u005C377}</th>
* <td>octal escape</td>
* <td>code point equivalents</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005C<line-terminator>}</th>
* <td>continuation</td>
* <td>discard</td>
* </tr>
* </tbody>
* </table>
*
* @implNote
* This method does <em>not</em> translate Unicode escapes such as "{@code \u005cu2022}".
* Unicode escapes are translated by the Java compiler when reading input characters and
* are not part of the string literal specification.
*
* @throws IllegalArgumentException when an escape sequence is malformed.
*
* @return String with escape sequences translated.
*
* @jls 3.10.7 Escape Sequences
*
* @since 13
*/
@jdk.internal.PreviewFeature(feature=jdk.internal.PreviewFeature.Feature.TEXT_BLOCKS,
essentialAPI=true)
public String translateEscapes() {
- csr of
-
JDK-8233116 Escape Sequences For Line Continuation and White Space (Preview)
- Resolved
- relates to
-
JDK-8235616 JLS changes for Text Blocks (Second Preview)
- Resolved
-
JDK-8231623 JEP 368: Text Blocks (Second Preview)
- Closed