Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8206982

Compiler support for Raw String Literals (Preview)

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P3 P3
    • 12
    • tools
    • None
    • source
    • minimal
    • Hide
      Since raw string literals employ a syntax that is new in Java SE, no pre-existing Java source code will fail to compile. For programs which process strings, no character can appear in a string derived from a raw string literal that could not already have appeared in a string derived from a traditional string literal. Parsers which work from the grammar of the Java language will obviously have to be updated to understand backtick-delimited raw string literals in expressions.

      Raw string literals are a preview feature in Java SE 12. It is possible that incompatible changes will be made to raw string literals in a later Java SE release, before they become final and permanent. It is also possible that raw string literals will be removed in a later Java SE release, without ever having become final and permanent.
      Show
      Since raw string literals employ a syntax that is new in Java SE, no pre-existing Java source code will fail to compile. For programs which process strings, no character can appear in a string derived from a raw string literal that could not already have appeared in a string derived from a traditional string literal. Parsers which work from the grammar of the Java language will obviously have to be updated to understand backtick-delimited raw string literals in expressions. Raw string literals are a preview feature in Java SE 12. It is possible that incompatible changes will be made to raw string literals in a later Java SE release, before they become final and permanent. It is also possible that raw string literals will be removed in a later Java SE release, without ever having become final and permanent.
    • Language construct
    • SE

      Summary

      Enhance the Java language by introducing raw string literals, a more flexible way to represent strings than traditional string literals.

      Problem

      Java's traditional string literals (JLS 3.10.5) allow various special characters to be represented with escape sequences (JLS 3.10.6), such as \" for a double-quote character and \n for a linefeed character. The use of escape sequences makes string literals hard to read and more likely to accidentally rely on OS-specific conventions (for example, \n is the newline character on Unix, but not Windows). In addition, the use of backslash \ to introduce an escape sequence means that a string literal which truly wishes to include a backslash must escape it, via \\. This doubling-up of backslashes makes it painful to denote file paths and regular expressions. Finally, string literals are subject to Unicode escape processing (JLS 3.3), where each \uXXXX character sequence is interpreted as a Unicode code point; this processing is convenient for representing, say, non-ASCII variable names, but inconvenient when embedding fragments of other Java programs. Broadly speaking, Java code that embeds fragments of other programs (whether Java, or SQL, or JSON, etc) needs a mechanism for capturing literal strings as-is, without special handling of newlines, backslashes, or Unicode escapes.

      Solution

      A raw string literal is a backtick-delimited literal that (i) opts out of Unicode escape processing, (ii) ignores Java escape sequences, and (iii) normalizes each embedded newline (as determined by the compiler's source encoding) to a JLS-defined, OS-independent representation. Multiple balanced backticks can be used to delimit a raw string literal that contains embedded backticks, without changing the payload string at all.

      The following are examples of raw string literals:

      `"`                // a string containing a single double-quote character
      ``can`t``          // a string containing the five characters 'c', 'a', 'n', '`' and 't'
      `This is a string` // a string containing 16 characters
      `\n`               // a string containing '\' and 'n'
      `\u2022`           // a string containing '\', 'u', '2', '0', '2' and '2'
      `This is a
      two-line string`   // a string with an embedded newline

      Specification

      Proposed changes to the Java Language Specification are attached. Because the type of a raw string literal is String, it is acceptable to use a raw string literal anywhere that a traditional string literal could be used, and vice versa.

      There are no changes to the JVM Specification. A string in the constant pool of a class file (JVMS 4.4.3) has always been independent of Java language rules for traditional string literals, so it is a suitable compilation target for raw string literals. A class file does not record whether a string in the constant pool was compiled from a traditional string literal or a raw string literal.

            abuckley Alex Buckley
            jlaskey Jim Laskey
            Jim Laskey
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: