-
JEP
-
Resolution: Withdrawn
-
P3
-
None
-
None
-
Jim Laskey
-
Feature
-
Open
-
SE
-
-
S
-
S
Summary
Add two new escape sequences for string literals and text block for managing explicit whitespace and carriage control.
Goals
Simplify the coding of long unwieldy string literals.
Provide a means for selectively discarding implicit newlines in a text block.
Provide a means to retain trailing white space in a text block.
Motivation
JEP 355 - Text Blocks (Preview) made great strides to improve the readability of complex string literals and string expressions. Nonetheless, there were a few issues left to be resolved. Specifically,
how to better represent very long single line string literals
how to suppress the removal of incidental whitespace in text blocks
Discarding Implicit Newlines
In text blocks, newlines (U+000A
) are not typically declared
explicitly using \n
. Instead, newlines are inserted implicit
wherever content breaks to the next line. What if an implicit newline
is not desired?
For example, it is common practice to split very long string literals into concatenations of smaller substrings and then hard wrapping the resulting string expression onto multiple lines.
String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
"elit, sed do eiusmod tempor incididunt ut labore " +
"et dolore magna aliqua.";
This is exactly the form of complex string expression that text blocks express more readably.
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore
et dolore magna aliqua.
""";
However, using text blocks to represent long strings has a drawback. An implicit newline is inserted on every line.
It would be helpful to be able to selectively denote which lines do not pick up the implicit newline.
The positioning of text block closing delimiter can be used to discard the final newline or to manage content indentation, but not both in the same text block. That is:
If the closing delimiter is positioned to avoid a final newline:
String x = """ abc def""";
resulting with
x
equal to"abc\ndef"
-- every line is stripped of leading white space. The ability to preserve leading white space by positioning the closing delimiter is lost.If the closing delimiter is positioned to preserve leading white space:
String y = """ abc def """;
resulting with
y
equal to (spaces denoted with periods)"....abc\n....def\n"
-- the delimiter is necessarily on its own line, so the final newline in the string is unavoidable.
It is sometimes desirable to position the closing delimiter to preserve leading white space without a final newline in the string.
Retaining Trailing White Space In Text Blocks
The space (U+0020
) character's lack of observability creates a problem
for text blocks. Text blocks are missing the per line delimiters, like
those found in string literals, that clearly indicate where the content
of a line begins and where the content of a line ends.
This lack of direct space character observability is the primary influencer for text blocks defaulting to strip trailing white space. However, this decision leads to a counter issue. How does a developer retain trailing white space in a text block?
The simplest solution is to use an observable placeholder such as the
octal escape sequence for space \040
(ASCII character 32, white
space)
String colors = """
red\040\040\040\040\040
green\040\040\040
blue\040\040\040\040
""";
This works because escape sequences are converted after incidental white space is removed. The above text block can be reduced to:
String colors = """
red \040
green \040
blue \040
""";
We can do this because only the last space needs to be observable. This observable character sequence acts as a fence, preventing the stripping of trailing white space from going beyond the sequence. Any white space to the left of the fence is not stripped away. Retention of trailing white space can be provided by using a character sequence fence.
Still, this use of the \040
octal escape sequence is rather arcane.
Beside the excessiveness, these sequences can preplex readers not fully
versed in ASCII. Readability is enhanced when a more intuitive escape
sequence is available for observable space.
Description
Change JLS 3.10.6 Escape Sequences for Character and String Literals
and String::translateEscapes
to recognize two new catagories of escape sequences:
\<line-terminator>
The escape sequences
\␊
(U+005C, U+000A
),\␍
(U+005C, U+000D
) and\␍␊
(U+005C, U+000D, U+000A
) represent line continuation. Unlike other escape sequences, these line continuation sequences are simply discarded during escape translation.Example;
String text = """ Lorem ipsum dolor sit amet, consectetur adipiscing \ elit, sed do eiusmod tempor incididunt ut labore \ et dolore magna aliqua.\ """;
After white space stripping, the above text block would have the value,
"Lorem ipsum dolor sit amet, consectetur adipiscing \␊elit, sed do eiusmod tempor incididunt ut labore \␊et dolore magna aliqua.\␊"
. Applying escape translation would then yield"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
.\<line-terminator>
can be used in combination with the closing delimiter to discard the final newline and preserve leading white space.String noLastLF = """ abc def\ """;
The above text block represents (spaces denoted with periods)
"....abc\n....def"
A
\\
escape sequence preceding a line terminator can be used when a backslash is desired at the end of the line. This works because Java does not recursively translate escape sequences.String python = """ if x == True and \\ y == False """;
\<line-terminator>
can also be used as a trailing white space fence, preventing white space to the left of the\<line-terminator>
from being stripped.String colors = """ red \ green \ blue \ """;
After processing the above text block will have the value (spaces denoted with periods)
"red...green.blue.."
.
\s
(U+005C, U+0073
)The escape sequence
\s
represents observable space and is translated to the ASCII space character (U+0020
).String str = "A\sline\swith\sspaces";
After translation the String
str
will have the value"A line with spaces"
.\s
can also be used as a trailing white space fence, preventing white space to the left of the\s
from being stripped.String colors = """ red \s green\s blue \s """;
After processing the above text block will have the value (spaces denoted with periods)
"red...\ngreen.\nblue..\n"
Alternatives
Line Continuation
Alternate escape sequences were evaluated, such as \+
, \-
, \c
. It
is felt that \<line-terminator>
is the least obscuring sequence and
is consistent with other languages (ex. bash.)
Having \<line-terminator>
defined as a generalized continuation
sequence in the Java language was also evaluated. It is felt that unlike
other languages (ex. C macroes), continuation would only be relevant to
string literals and text blocks.
One of the main advantages of using the \<line-terminator>
escape
sequences over other line continuation techniques is the zero runtime
cost. Most alternatives require some kind of runtime computation, with
the cost increasing as a string gets larger. While we could reduce or
zero this cost with optimization, developers are loath to use method
invocation for prosaic idioms; gets in their way.
Even so, a straightforward approach for line wrapping long string literals would be to simply replace the newlines with spaces or empty string.
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore
et dolore magna aliqua.""".replace('\n', ' ');
or
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore
et dolore magna aliqua.
""".replace("\n", "");
Either approach may get the desired result, but is still encumbered with runtime cost and the need for an explicit call. As well, we have little control over line terminator retention or trailing white space stripping.
Another approach is to use a visible fence sequence, such as $
or
...
. The fence sequence, in combination with the line terminator, does
provide control over line terminator retention or trailing white space
stripping, but is still encumbered with runtime cost and the need for an
explicit call.
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing $
elit, sed do eiusmod tempor incididunt ut labore $
et dolore magna aliqua.""".replace("$.\n", "");
or
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing ...
elit, sed do eiusmod tempor incididunt ut labore ...
et dolore magna aliqua.""".replace("...\n", "");
Observable Space
Observable space could be downplayed as an aesthetic change but we feel
that \s
provides significant code clarity. The following examples
equivalently represent five spaces:
" "
"\040\040\040\040\040"
"\u0020\u0020\u0020\u0020\u0020"
"\s\s\s\s\s"
Other escape sequences were evaluated. Most other sequences don't add
value and the association of \s
to space is clear (as \t
is for tab).
\<space>
was also considered.
"\ \ \ \ \ "
While aesthetically acceptable, the interpretation of \<space>
at the end
of line becomes perplexing. Is this a \<space>
or a \␊
? Using the
observable character s
removes any ambiguity.
Alternate character sequence fences can be used for trailing white space retention, but they incur a runtime cost when stripped away.
String colors = """
red $
green $
blue $
""".replace("$\n", "\n");
It is also possible to change the text block rules to not remove incidental white space. However, the strong argument for removing incidental white space remains.
Testing
Tests will be added to test various permutations of the new escape sequences, along with testing interaction with existing escape sequences.
Risks and Assumptions
The primary risk is that there may be tools in the field that cannot be modified to accept these new escape sequences.
Dependencies
Dependent on JEP 355 - Text Blocks (Preview) moving forward.