Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8233117

Escape Sequences For Line Continuation and White Space (Preview)

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P3 P3
    • 14
    • core-libs
    • None
    • source
    • minimal
    • New escape sequences exist as syntax errors in prior releases.
    • Java API, Language construct
    • SE

      Summary

      Add two new escape sequences for string and character literals for managing explicit whitespace and carriage control.

      Problem

      In text blocks, newlines (U+000A) are not typically declared explicitly using \n. Instead, newlines are inserted implicitly wherever content breaks to the next line. What if an implicit newline is not desired?

      For example, it is common practice to split very long string literals into concatenations of smaller substrings and then hard-wrap the resulting string literals over multiple lines of source code:

        String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
                         "elit, sed do eiusmod tempor incididunt ut labore " +
                         "et dolore magna aliqua.";

      This is exactly the form of complex string that text blocks express more readably:

        String text = """
                      Lorem ipsum dolor sit amet, consectetur adipiscing
                      elit, sed do eiusmod tempor incididunt ut labore
                      et dolore magna aliqua.
                      """;

      However, using text blocks to represent long strings has a drawback: an implicit newline is inserted on every line. It would be helpful to be able to selectively denote which lines do not pick up the implicit newline.

      Turning to another matter, the space (U+0020) character's lack of observability creates a problem for strings.

      For example, text blocks are missing per-line delimiters, like those found in string literals, that clearly indicate where the content of a line begins and where the content of a line ends. The lack of direct space-character observability is the primary reason for text blocks always stripping trailing white space. However, this behavior leads to a counter issue: How does a developer retain trailing white space in a text block?

      For another example, various visual tricks are required to get an accurate count of multiple spaces any string literal. For instance, how many spaces are in the string literal " "? How can a developer count what they can not visually discern?

      Solution

      Change the JLS section on "Escape Sequences for Character and String Literals" and the API String::translateEscapes to recognize two new escape sequences:

      • \<line-terminator>

        The escape sequences \␊ (U+005C, U+000A), \␍ (U+005C, U+000D) and \␍␊ (U+005C, U+000D, U+000A) represent line continuation. Unlike other escape sequences, these line continuation sequences are simply discarded during escape translation.

        Example;

        String text = """
                    Lorem ipsum dolor sit amet, consectetur adipiscing \
                    elit, sed do eiusmod tempor incididunt ut labore \
                    et dolore magna aliqua.\
                    """;

        After white space stripping, the above text block would have the value, "Lorem ipsum dolor sit amet, consectetur adipiscing \␊elit, sed do eiusmod tempor incididunt ut labore \␊et dolore magna aliqua.\␊". Applying escape translation would then yield "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.".

      • \s (U+005C, U+0073)

        The escape sequence \s represents observable space and is translated to the ASCII space character (U+0020).

        String str = "A\sline\swith\sspaces";

        After translation the String str will have the value "A line with spaces".

      Specification

      JLS changes for the new escape sequences are found in section 3.10.7 of the attachment text-blocks-jls.html. There are no JVMS changes.

      String::translateEscapes diff

      --- a/src/java.base/share/classes/java/lang/String.java 2019-11-12 13:32:43.000000000 -0400
      +++ b/src/java.base/share/classes/java/lang/String.java 2019-11-12 13:32:02.000000000 -0400
      @@ -3060,10 +3060,15 @@
            *     <th scope="row">{@code \u005Cr}</th>
            *     <td>carriage return</td>
            *     <td>{@code U+000D}</td>
            *   </tr>
            *   <tr>
      +     *     <th scope="row">{@code \u005Cs}</th>
      +     *     <td>space</td>
      +     *     <td>{@code U+0020}</td>
      +     *   </tr>
      +     *   <tr>
            *     <th scope="row">{@code \u005C"}</th>
            *     <td>double quote</td>
            *     <td>{@code U+0022}</td>
            *   </tr>
            *   <tr>
      @@ -3079,10 +3084,15 @@
            *   <tr>
            *     <th scope="row">{@code \u005C0 - \u005C377}</th>
            *     <td>octal escape</td>
            *     <td>code point equivalents</td>
            *   </tr>
      +     *   <tr>
      +     *     <th scope="row">{@code \u005C<line-terminator>}</th>
      +     *     <td>continuation</td>
      +     *     <td>discard</td>
      +     *   </tr>
            *   </tbody>
            * </table>
            *
            * @implNote
            * This method does <em>not</em> translate Unicode escapes such as "{@code \u005cu2022}".

      String::translateEscapes after diff changes

      /**
       * {@preview Associated with text blocks, a preview feature of
       *           the Java language.
       *
       *           This method is associated with <i>text blocks</i>, a preview
       *           feature of the Java language. Programs can only use this
       *           method when preview features are enabled. Preview features
       *           may be removed in a future release, or upgraded to permanent
       *           features of the Java language.}
       *
       * Returns a string whose value is this string, with escape sequences
       * translated as if in a string literal.
       * <p>
       * Escape sequences are translated as follows;
       * <table class="striped">
       *   <caption style="display:none">Translation</caption>
       *   <thead>
       *   <tr>
       *     <th scope="col">Escape</th>
       *     <th scope="col">Name</th>
       *     <th scope="col">Translation</th>
       *   </tr>
       *   </thead>
       *   <tbody>
       *   <tr>
       *     <th scope="row">{@code \u005Cb}</th>
       *     <td>backspace</td>
       *     <td>{@code U+0008}</td>
       *   </tr>
       *   <tr>
       *     <th scope="row">{@code \u005Ct}</th>
       *     <td>horizontal tab</td>
       *     <td>{@code U+0009}</td>
       *   </tr>
       *   <tr>
       *     <th scope="row">{@code \u005Cn}</th>
       *     <td>line feed</td>
       *     <td>{@code U+000A}</td>
       *   </tr>
       *   <tr>
       *     <th scope="row">{@code \u005Cf}</th>
       *     <td>form feed</td>
       *     <td>{@code U+000C}</td>
       *   </tr>
       *   <tr>
       *     <th scope="row">{@code \u005Cr}</th>
       *     <td>carriage return</td>
       *     <td>{@code U+000D}</td>
       *   </tr>
       *   <tr>
       *     <th scope="row">{@code \u005Cs}</th>
       *     <td>space</td>
       *     <td>{@code U+0020}</td>
       *   </tr>
       *   <tr>
       *     <th scope="row">{@code \u005C"}</th>
       *     <td>double quote</td>
       *     <td>{@code U+0022}</td>
       *   </tr>
       *   <tr>
       *     <th scope="row">{@code \u005C'}</th>
       *     <td>single quote</td>
       *     <td>{@code U+0027}</td>
       *   </tr>
       *   <tr>
       *     <th scope="row">{@code \u005C\u005C}</th>
       *     <td>backslash</td>
       *     <td>{@code U+005C}</td>
       *   </tr>
       *   <tr>
       *     <th scope="row">{@code \u005C0 - \u005C377}</th>
       *     <td>octal escape</td>
       *     <td>code point equivalents</td>
       *   </tr>
       *   <tr>
       *     <th scope="row">{@code \u005C<line-terminator>}</th>
       *     <td>continuation</td>
       *     <td>discard</td>
       *   </tr>
       *   </tbody>
       * </table>
       *
       * @implNote
       * This method does <em>not</em> translate Unicode escapes such as "{@code \u005cu2022}".
       * Unicode escapes are translated by the Java compiler when reading input characters and
       * are not part of the string literal specification.
       *
       * @throws IllegalArgumentException when an escape sequence is malformed.
       *
       * @return String with escape sequences translated.
       *
       * @jls 3.10.7 Escape Sequences
       *
       * @since 13
       */
      @jdk.internal.PreviewFeature(feature=jdk.internal.PreviewFeature.Feature.TEXT_BLOCKS,
                                   essentialAPI=true)
      public String translateEscapes() {

            jlaskey Jim Laskey
            jlaskey Jim Laskey
            Alex Buckley
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: