Summary
Enhance the Java language by introducing raw string literals, a more flexible way to represent strings than traditional string literals.
Problem
Java's traditional string literals (JLS 3.10.5) allow various special characters to be represented with escape sequences (JLS 3.10.6), such as \"
for a double-quote character and \n
for a linefeed character. The use of escape sequences makes string literals hard to read and more likely to accidentally rely on OS-specific conventions (for example, \n
is the newline character on Unix, but not Windows). In addition, the use of backslash \
to introduce an escape sequence means that a string literal which truly wishes to include a backslash must escape it, via \\
. This doubling-up of backslashes makes it painful to denote file paths and regular expressions. Finally, string literals are subject to Unicode escape processing (JLS 3.3), where each \uXXXX
character sequence is interpreted as a Unicode code point; this processing is convenient for representing, say, non-ASCII variable names, but inconvenient when embedding fragments of other Java programs. Broadly speaking, Java code that embeds fragments of other programs (whether Java, or SQL, or JSON, etc) needs a mechanism for capturing literal strings as-is, without special handling of newlines, backslashes, or Unicode escapes.
Solution
A raw string literal is a backtick-delimited literal that (i) opts out of Unicode escape processing, (ii) ignores Java escape sequences, and (iii) normalizes each embedded newline (as determined by the compiler's source encoding) to a JLS-defined, OS-independent representation. Multiple balanced backticks can be used to delimit a raw string literal that contains embedded backticks, without changing the payload string at all.
The following are examples of raw string literals:
`"` // a string containing a single double-quote character
``can`t`` // a string containing the five characters 'c', 'a', 'n', '`' and 't'
`This is a string` // a string containing 16 characters
`\n` // a string containing '\' and 'n'
`\u2022` // a string containing '\', 'u', '2', '0', '2' and '2'
`This is a
two-line string` // a string with an embedded newline
Specification
Proposed changes to the Java Language Specification are attached. Because the type of a raw string literal is String
, it is acceptable to use a raw string literal anywhere that a traditional string literal could be used, and vice versa.
There are no changes to the JVM Specification. A string in the constant pool of a class
file (JVMS 4.4.3) has always been independent of Java language rules for traditional string literals, so it is a suitable compilation target for raw string literals. A class
file does not record whether a string in the constant pool was compiled from a traditional string literal or a raw string literal.
- csr of
-
JDK-8206981 Compiler support for Raw String Literals
-
- Resolved
-
- relates to
-
JDK-8215682 Remove compiler support for Raw String Literals from JDK 12
-
- Closed
-
-
JDK-8200435 String::align, String::indent
-
- Closed
-