Copyright © 2019 Oracle and/or its affiliates · All Rights Reserved · Legal Notice
This document proposes changes to The Java(R) Language Specification, Java SE 14 Edition in support of Text Blocks, a preview feature of Java SE 14.
See JEP 368 for an overview of Text Blocks.
Last updated 2019-11-05
The following are essential API elements associated with Text Blocks:
stripIndent
in String
translateEscapes
in String
The pre-existing section 3.10.6, “Escape Sequences for Character and String Literals”, becomes 3.10.7, “Escape Sequences”.
The pre-existing section 3.10.7, “The Null Literal”, will be renumbered to 3.10.8.
A character literal is expressed as a character or an escape sequence (3.10.6), enclosed in ASCII single quotes. (The single-quote, or apostrophe, character is \u0027
.)
'
SingleCharacter '
'
EscapeSequence '
'
or \
A character literal is always of type char
([4.2.1]).
Character literals can only represent UTF-16 code units ([3.1]), i.e., they are limited to values from \u0000
to \uffff
. Supplementary characters must be represented either as a surrogate pair within a char
sequence, or as an integer, depending on the API they are used with.
The content of a character literal is the SingleCharacter or the EscapeSequence which follows the opening '
.
It is a compile-time error for the character following the SingleCharacter or EscapeSequence content to be other than a '
.
It is a compile-time error for a line terminator (3.4) to appear after the opening '
and before the closing '
.
As specified in 3.4, the characters CR and LF are never an InputCharacter; each is recognized as constituting a LineTerminator, so it may not appear in a string literal, even in the escape sequence
\
LineTerminator.
The character represented a character literal is the content of the character literal with any escape sequence interpreted, as if by execution of String::translateEscapes
on the content.
The following are examples of
char
literals:
'a'
'%'
'\t'
'\\'
'\''
'\u03a9'
'\uFFFF'
'\177'
'™'
Because Unicode escapes are processed very early, it is not correct to write
'\u000a'
for a character literal whose value is linefeed (LF); the Unicode escape\u000a
is transformed into an actual linefeed in translation step 1 (3.3) and the linefeed becomes a LineTerminator in step 2 (3.4), and so the character literal is not valid in step 3. Instead, one should use the escape sequence'\n'
(3.10.6). Similarly, it is not correct to write'\u000d'
for a character literal whose value is carriage return (CR). Instead, use'\r'
. Finally, it is not possible to write'\u0027'
for a character literal containing an apostrophe ('
).
In C and C++, a character literal may contain representations of more than one character, but the value of such a character literal is implementation-defined. In the Java programming language, a character literal always represents exactly one character.
A string literal consists of zero or more characters enclosed in double quotes. Characters such as newlines may be represented by escape sequences (3.10.7).- one escape sequence for characters in the range U+0000 to U+FFFF, two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U+10FFFF
"
{StringCharacter} "
"
or \
A string literal is always of type String
(4.3.3).
The content of a string literal is the sequence of characters that begins immediately after the opening "
and ends immediately before the closing matching "
.
It is a compile-time error for a line terminator to appear in the content of a string literal after the opening ."
and before the closing matching "
As specified in 3.4, the characters CR and LF are never an InputCharacter; each is recognized as constituting a LineTerminator, so it may not appear in a string literal, even in the escape sequence
\
LineTerminator.
The string represented by a string literal is the content of the string literal with every escape sequence interpreted, as if by execution of String::translateEscapes
on the content.
The following are examples of string literals:
"" // the empty string "\"" // a string containing " alone "This is a string" // a string containing 16 characters "This is a " + // actually a string-valued constant expression, "two-line string" // formed from two string literals
Because Unicode escapes are processed very early, it is not correct to write
"\u000a"
for a string literal containing a single linefeed (LF); the Unicode escape\u000a
is transformed into an actual linefeed in translation step 1 (3.3) and the linefeed becomes a LineTerminator in step 2 (3.4), and so the string literal is not valid in step 3. Instead, one should write"\n"
(3.10.6). Similarly, it is not correct to write"\u000d"
for a string literal containing a single carriage return (CR). Instead, use"\r"
. Finally, it is not possible to write"\u0022"
for a string literal containing a double quotation mark ("
).A long string literal can always be broken up into shorter pieces and written as a (possibly parenthesized) expression using the string concatenation operator
+
(15.18.1).
At run time, a string literal is a reference to an instance of class String
([4.3.1], 4.3.3) that denotes the string represented by the string literal.
Moreover, a string literal always refers to the same instance of class String
. This is because string literals - or, more generally, strings that are the values of constant expressions (15.28) - are “interned” so as to share unique instances, using the method String.intern
(12.5).
A text block consists of zero or more characters enclosed by opening and closing delimiters. Characters may be represented by escape sequences (3.10.7), but the newline and double quote characters that must be represented with escape sequences in a string literal may be represented directly in a text block.
The following productions from 3.3, 3.4, and 3.6 are shown here for convenience:
A text block is always of type String
(4.3.3).
The opening delimiter is a sequence that starts with three double quote characters ("""
), continues with zero or more space, tab, and form feed characters, and concludes with a line terminator.
The closing delimiter is a sequence of three double quote characters.
The content of a text block is the sequence of characters that begins immediately after the line terminator of the opening delimiter, and ends immediately before the first double quote of the closing delimiter.
Unlike in a string literal (3.10.5), it is not a compile-time error for a line terminator to appear in the content of a text block.
Example 3.10.6-1. Text Blocks
When multi-line strings are desired, a text block is usually more readable than a concatenation of string literals. For example, compare these alternative representations of a snippet of HTML:
String html = "<html>\n" +
" <body>\n" +
" <p>Hello, world</p>\n" +
" </body>\n" +
"</html>\n";
String html = """
<html>
<body>
<p>Hello, world</p>
</body>
</html>
""";
Here are some examples of text blocks:
String season = """
winter"""; // the six characters w i n t e r
String period = """
winter
"""; // the seven characters w i n t e r LF
String greeting =
"""
Hi, "Bob"
"""; // the ten characters H i , SP " B o b " LF
String salutation =
"""
Hi,
"Bob"
"""; // the eleven characters H i , LF SP " B o b " LF
String empty = """
"""; // the empty string (zero length)
String quote = """
"
"""; // the two characters " LF
String backslash = """
\\
"""; // the two characters \ LF
The use of the escape sequences \"
and \n
is permitted in a text block, but not necessary or recommended. However, representing the sequence """
in a text block requires the escaping of at least one "
character, to avoid mimicking the closing delimiter.
Example 3.10.6-2. Escape sequences in text blocks
The following snippet of text would be less readable if the " characters were escaped:
String story = """
"When I use a word," Humpty Dumpty said,
in rather a scornful tone, "it means just what I
choose it to mean - neither more nor less."
"The question is," said Alice, "whether you
can make words mean so many different things."
"The question is," said Humpty Dumpty,
"which is to be master - that's all."""";
If a text block is to denote another text block, then it is recommended to escape the first " of the embedded opening and closing delimiters:
The string represented by a text block is not the literal sequence of characters in the content. Instead, the string represented by a text block is the result of applying the following transformations to the content, in order:
Line terminators are normalized to the ASCII LF character, as follows:
An ASCII CR character followed by an ASCII LF character is translated to an ASCII LF character.
An ASCII CR character is translated to an ASCII LF character.
Incidental white space is removed, as if by execution of String::stripIndent
on the characters resulting from step 1.
Escape sequences are interpreted, as if by execution of String::translateEscapes
on the characters resulting from step 2.
Example 3.10.6-3. Order of transformations on text block content
Interpreting escape sequences last allows developers to use \n, \f, and \r for vertical formatting of a string without affecting the normalization of line terminators, and to use \b and \t for horizontal formatting of a string without affecting the removal of incidental white space. For example, consider this text block that mentions the escape sequence \r (CR):
The \r escapes are not interpreted until after the line terminators have been normalized to LF. Using Unicode escapes to visualize LF (\u000A) and CR (\u000D), and using | to visualize the left margin, the final result is:
When this specification says that a text block contains a particular character or sequence of characters, or that a particular character or sequence of characters is in a text block, it means that the string represented by the text block (as opposed to the content of the text block) contains the character or sequence of characters.
At run time, a text block is a reference to an instance of class String
that denotes the string represented by the text block.
A text block always refers to the same instance of class String
. This is because the strings represented by text blocks - or, more generally, strings that are the values of constant expressions (15.28) - are “interned” so as to share unique instances (12.5).
Example 3.10.6-4. Text blocks evaluate to strings
Text blocks can be used wherever an expression of type String
is allowed, such as in string concatenation (15.18.1), in method invocation on class String
, and in annotations with String
elements:
In character literals (3.10.4), string literals (3.10.5), and text blocks (3.10.6), the character and string escape sequences allow for the representation of some nongraphic characters without using Unicode escapes (3.3), as well as the single quote, double quote, and backslash characters.
\ b
(backspace BS, Unicode \u0008
)
\ s
(space SP, Unicode \u0020
)
\ t
(horizontal tab HT, Unicode \u0009
)
\ f
(form feed FF, Unicode \u000c
)
\ n
(linefeed LF, Unicode \u000a
)
\ r
(carriage return CR, Unicode \u000d
)
\
LineTerminator (line continuation, no Unicode representation)
\ "
(double quote "
, Unicode \u0022
)
\ '
(single quote '
, Unicode \u0027
)
\ \
(backslash \
, Unicode \u005c
)
\u0000
to \u00ff
)
…
Octal escapes are provided for compatibility with C, but can express only Unicode values
\u0000
through\u00FF
, so Unicode escapes are usually preferred.
It is a compile-time error if the character following a backslash in an escape sequence is not a LineTerminator or an ASCII b
, s
, t
, f
, n
, r
, "
, '
, \
, 0
, 1
, 2
, 3
, 4
, 5
, 6
, or 7
.
An escape sequence in the content of a character literal, string literal, or text block is interpreted by replacing its \
and trailing characters with the single character denoted by the Unicode escape in the EscapeSequence grammar. The line continuation escape sequence has no corresponding Unicode escape, so is interpreted by replacing it with nothing.
The line continuation escape sequence may appear in a text block, but cannot appear in a character literal (3.10.4) or a string literal (3.10.5) because each disallows a LineTerminator.
3.1, final paragraph: add mention of text blocks.
3.7, final paragraph: add mention of text blocks.
4.3.3, third paragraph: add mention of text blocks.
12.5, second paragraph, list should start as follows:
“Loading of a class or interface that contains a string literal (§3.10.5) or a text block (§3.10.6) may create a new String
object to represent the string literal or text block. (This will not occur if a string an instance of String
denoting the same sequence of Unicode code points as the string literal or text block has previously been interned.)”
15.8.1, fifth bullet: add mention of text blocks.
15.28, first bullet: add mention of text blocks:
“Literals of primitive type (§3.10.1, §3.10.2, §3.10.3, §3.10.4, §3.10.5) , and string literals (§3.10.5), and text blocks (§3.10.6).”
JVMS 4.7.16.1, const_value_index
: rephrase from “denotes either a primitive constant value or a String literal as the value of …” to “denotes a constant of either a primitive type or the type String
as the value of …”.
Some clarification of terminology around “escapes” is desirable:
3.3: A compiler for the Java programming language (“Java compiler”) first recognizes Unicode escapes in its input, … and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters. … One Unicode escape can represent characters in the range U+0000 to U+FFFF. Representing supplementary characters in the range U+010000 to U+10FFFF requires two consecutive Unicode escapes.
3.5: The input characters and line terminators that result from Unicode escape processing …
ORACLE AMERICA, INC. IS WILLING TO LICENSE THIS SPECIFICATION TO YOU ONLY UPON THE CONDITION THAT YOU ACCEPT ALL OF THE TERMS CONTAINED IN THIS LICENSE AGREEMENT (“AGREEMENT”). PLEASE READ THE TERMS AND CONDITIONS OF THIS AGREEMENT CAREFULLY.
Specification: JSR-389 Java SE 14 (“Specification”)
Version: 14
Status: Early Draft Review
Release: December 2019
Copyright © 1997, 2019, Oracle America, Inc.
500 Oracle Parkway, Redwood City, California 94065, U.S.A.
All rights reserved.
The Specification is protected by copyright and the information described therein may be protected by one or more U.S. patents, foreign patents, or pending applications. Except as provided under the following license, no part of the Specification may be reproduced in any form by any means without the prior written authorization of Oracle America, Inc. (“Oracle”) and its licensors, if any. Any use of the Specification and the information described therein will be governed by the terms and conditions of this Agreement.
Subject to the terms and conditions of this license, including your compliance with Paragraphs 1 and 2 below, Oracle hereby grants you a fully-paid, non-exclusive, non-transferable, limited license (without the right to sublicense) under Oracle’s intellectual property rights to:
Review the Specification for the purposes of evaluation. This includes: (i) developing implementations of the Specification for your internal, non-commercial use; (ii) discussing the Specification with any third party; and (iii) excerpting brief portions of the Specification in oral or written communications which discuss the Specification provided that such excerpts do not in the aggregate constitute a significant portion of the Technology.
Distribute implementations of the Specification to third parties for their testing and evaluation use, provided that any such implementation:
does not modify, subset, superset or otherwise extend the Licensor Name Space, or include any public or protected packages, classes, Java interfaces, fields or methods within the Licensor Name Space other than those required/authorized by the Specification or Specifications being implemented;
is clearly and prominently marked with the word “UNTESTED” or “EARLY ACCESS” or “INCOMPATIBLE” or “UNSTABLE” or “BETA” in any list of available builds and in proximity to every link initiating its download, where the list or link is under Licensee’s control; and
includes the following notice: “This is an implementation of an early-draft specification developed under the Java Community Process (JCP) and is made available for testing and evaluation purposes only. The code is not compatible with any specification of the JCP.”
The grant set forth above concerning your distribution of implementations of the specification is contingent upon your agreement to terminate development and distribution of your “early draft” implementation as soon as feasible following final completion of the specification. If you fail to do so, the foregoing grant shall be considered null and void.
No provision of this Agreement shall be understood to restrict your ability to make and distribute to third parties applications written to the Specification.
Other than this limited license, you acquire no right, title or interest in or to the Specification or any other Oracle intellectual property, and the Specification may only be used in accordance with the license terms set forth herein. This license will expire on the earlier of: (a) two (2) years from the date of Release listed above; (b) the date on which the final version of the Specification is publicly released; or (c) the date on which the Java Specification Request (JSR) to which the Specification corresponds is withdrawn. In addition, this license will terminate immediately without notice from Oracle if you fail to comply with any provision of this license. Upon termination, you must cease use of or destroy the Specification.
“Licensor Name Space” means the public class or interface declarations whose names begin with “java”, “javax”, “com.oracle” or their equivalents in any subsequent naming convention adopted by Oracle through the Java Community Process, or any recognized successors or replacements thereof.
No right, title, or interest in or to any trademarks, service marks, or trade names of Oracle or Oracle’s licensors is granted hereunder. Oracle, the Oracle logo, and Java are trademarks or registered trademarks of Oracle America, Inc. in the U.S. and other countries.
THE SPECIFICATION IS PROVIDED “AS IS” AND IS EXPERIMENTAL AND MAY CONTAIN DEFECTS OR DEFICIENCIES WHICH CANNOT OR WILL NOT BE CORRECTED BY ORACLE. ORACLE MAKES NO REPRESENTATIONS OR WARRANTIES, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT THAT THE CONTENTS OF THE SPECIFICATION ARE SUITABLE FOR ANY PURPOSE OR THAT ANY PRACTICE OR IMPLEMENTATION OF SUCH CONTENTS WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADE SECRETS OR OTHER RIGHTS. This document does not represent any commitment to release or implement any portion of the Specification in any product.
THE SPECIFICATION COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION THEREIN; THESE CHANGES WILL BE INCORPORATED INTO NEW VERSIONS OF THE SPECIFICATION, IF ANY. ORACLE MAY MAKE IMPROVEMENTS AND/OR CHANGES TO THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED IN THE SPECIFICATION AT ANY TIME. Any use of such changes in the Specification will be governed by the then-current license for the applicable version of the Specification.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ORACLE OR ITS LICENSORS BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION, LOST REVENUE, PROFITS OR DATA, OR FOR SPECIAL, INDIRECT, CONSEQUENTIAL, INCIDENTAL OR PUNITIVE DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF OR RELATED TO ANY FURNISHING, PRACTICING, MODIFYING OR ANY USE OF THE SPECIFICATION, EVEN IF ORACLE AND/OR ITS LICENSORS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
You will hold Oracle (and its licensors) harmless from any claims based on your use of the Specification for any purposes other than the limited right of evaluation as described above, and from any claims that later versions or releases of any Specification furnished to you are incompatible with the Specification provided to you under this license.
If this Software is being acquired by or on behalf of the U.S. Government or by a U.S. Government prime contractor or subcontractor (at any tier), then the Government’s rights in the Software and accompanying documentation shall be only as set forth in this license; this is in accordance with 48 C.F.R. 227.7201 through 227.7202-4 (for Department of Defense (DoD) acquisitions) and with 48 C.F.R. 2.101 and 12.212 (for non-DoD acquisitions).
You may wish to report any ambiguities, inconsistencies or inaccuracies you may find in connection with your evaluation of the Specification (“Feedback”). To the extent that you provide Oracle with any Feedback, you hereby: (i) agree that such Feedback is provided on a non-proprietary and nonconfidential basis, and (ii) grant Oracle a perpetual, non-exclusive, worldwide, fully paid-up, irrevocable license, with the right to sublicense through multiple levels of sublicensees, to incorporate, disclose, and use without limitation the Feedback for any purpose related to the Specification and future versions, implementations, and test suites thereof.
Any action related to this Agreement will be governed by California law and controlling U.S. federal law. The U.N. Convention for the International Sale of Goods and the choice of law rules of any jurisdiction will not apply. The Specification is subject to U.S. export control laws and may be subject to export or import regulations in other countries. Licensee agrees to comply strictly with all such laws and regulations and acknowledges that it has the responsibility to obtain such licenses to export, re-export or import as may be required after delivery to Licensee.
This Agreement is the parties’ entire agreement relating to its subject matter. It supersedes all prior or contemporaneous oral or written communications, proposals, conditions, representations and warranties and prevails over any conflicting or additional terms of any quote, order, acknowledgment, or other communication between the parties relating to its subject matter during the term of this Agreement. No modification to this Agreement will be binding, unless in writing and signed by an authorized representative of each party.