Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8227870

Escape Sequences For Line Continuation and White Space (Preview)

    XMLWordPrintable

Details

    • JEP
    • Resolution: Withdrawn
    • P3
    • None
    • specification
    • None
    • Jim Laskey
    • Feature
    • Open
    • SE
    • amber dash dev at openjdk dot java dot net
    • S
    • S

    Description

      Summary

      Add two new escape sequences for string literals and text block for managing explicit whitespace and carriage control.

      Goals

      • Simplify the coding of long unwieldy string literals.

      • Provide a means for selectively discarding implicit newlines in a text block.

      • Provide a means to retain trailing white space in a text block.

      Motivation

      JEP 355 - Text Blocks (Preview) made great strides to improve the readability of complex string literals and string expressions. Nonetheless, there were a few issues left to be resolved. Specifically,

      • how to better represent very long single line string literals

      • how to suppress the removal of incidental whitespace in text blocks

      Discarding Implicit Newlines

      In text blocks, newlines (U+000A) are not typically declared explicitly using \n. Instead, newlines are inserted implicit wherever content breaks to the next line. What if an implicit newline is not desired?

      For example, it is common practice to split very long string literals into concatenations of smaller substrings and then hard wrapping the resulting string expression onto multiple lines.

        String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
                         "elit, sed do eiusmod tempor incididunt ut labore " +
                         "et dolore magna aliqua.";

      This is exactly the form of complex string expression that text blocks express more readably.

        String text = """
                      Lorem ipsum dolor sit amet, consectetur adipiscing
                      elit, sed do eiusmod tempor incididunt ut labore
                      et dolore magna aliqua.
                      """;

      However, using text blocks to represent long strings has a drawback. An implicit newline is inserted on every line.

      It would be helpful to be able to selectively denote which lines do not pick up the implicit newline.

      The positioning of text block closing delimiter can be used to discard the final newline or to manage content indentation, but not both in the same text block. That is:

      • If the closing delimiter is positioned to avoid a final newline:

        String x = """
            abc
            def""";

        resulting with x equal to "abc\ndef" -- every line is stripped of leading white space. The ability to preserve leading white space by positioning the closing delimiter is lost.

      • If the closing delimiter is positioned to preserve leading white space:

        String y = """
                abc
                def
            """;

        resulting with y equal to (spaces denoted with periods) "....abc\n....def\n" -- the delimiter is necessarily on its own line, so the final newline in the string is unavoidable.

      It is sometimes desirable to position the closing delimiter to preserve leading white space without a final newline in the string.

      Retaining Trailing White Space In Text Blocks

      The space (U+0020) character's lack of observability creates a problem for text blocks. Text blocks are missing the per line delimiters, like those found in string literals, that clearly indicate where the content of a line begins and where the content of a line ends.

      This lack of direct space character observability is the primary influencer for text blocks defaulting to strip trailing white space. However, this decision leads to a counter issue. How does a developer retain trailing white space in a text block?

      The simplest solution is to use an observable placeholder such as the octal escape sequence for space \040 (ASCII character 32, white space)

      String colors = """
          red\040\040\040\040\040
          green\040\040\040
          blue\040\040\040\040
          """;

      This works because escape sequences are converted after incidental white space is removed. The above text block can be reduced to:

      String colors = """
          red    \040
          green  \040
          blue   \040
          """;        

      We can do this because only the last space needs to be observable. This observable character sequence acts as a fence, preventing the stripping of trailing white space from going beyond the sequence. Any white space to the left of the fence is not stripped away. Retention of trailing white space can be provided by using a character sequence fence.

      Still, this use of the \040 octal escape sequence is rather arcane. Beside the excessiveness, these sequences can preplex readers not fully versed in ASCII. Readability is enhanced when a more intuitive escape sequence is available for observable space.

      Description

      Change JLS 3.10.6 Escape Sequences for Character and String Literals and String::translateEscapes to recognize two new catagories of escape sequences:

      • \<line-terminator>

        The escape sequences \␊ (U+005C, U+000A), \␍ (U+005C, U+000D) and \␍␊ (U+005C, U+000D, U+000A) represent line continuation. Unlike other escape sequences, these line continuation sequences are simply discarded during escape translation.

        Example;

        String text = """
                    Lorem ipsum dolor sit amet, consectetur adipiscing \
                    elit, sed do eiusmod tempor incididunt ut labore \
                    et dolore magna aliqua.\
                    """;

        After white space stripping, the above text block would have the value, "Lorem ipsum dolor sit amet, consectetur adipiscing \␊elit, sed do eiusmod tempor incididunt ut labore \␊et dolore magna aliqua.\␊". Applying escape translation would then yield "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.".

        \<line-terminator> can be used in combination with the closing delimiter to discard the final newline and preserve leading white space.

         String noLastLF = """
                abc
                def\
            """;

        The above text block represents (spaces denoted with periods) "....abc\n....def"

        A \\ escape sequence preceding a line terminator can be used when a backslash is desired at the end of the line. This works because Java does not recursively translate escape sequences.

        String python = """
            if x == True and \\
                y == False
            """;

        \<line-terminator> can also be used as a trailing white space fence, preventing white space to the left of the \<line-terminator> from being stripped.

        String colors = """
            red   \
            green \
            blue  \
            """;        

        After processing the above text block will have the value (spaces denoted with periods) "red...green.blue..".


      • \s (U+005C, U+0073)

        The escape sequence \s represents observable space and is translated to the ASCII space character (U+0020).

        String str = "A\sline\swith\sspaces";

        After translation the String str will have the value "A line with spaces".

        \s can also be used as a trailing white space fence, preventing white space to the left of the \s from being stripped.

        String colors = """
            red  \s
            green\s
            blue \s
            """;        

        After processing the above text block will have the value (spaces denoted with periods) "red...\ngreen.\nblue..\n"

      Alternatives

      Line Continuation

      Alternate escape sequences were evaluated, such as \+, \-, \c. It is felt that \<line-terminator> is the least obscuring sequence and is consistent with other languages (ex. bash.)

      Having \<line-terminator> defined as a generalized continuation sequence in the Java language was also evaluated. It is felt that unlike other languages (ex. C macroes), continuation would only be relevant to string literals and text blocks.

      One of the main advantages of using the \<line-terminator> escape sequences over other line continuation techniques is the zero runtime cost. Most alternatives require some kind of runtime computation, with the cost increasing as a string gets larger. While we could reduce or zero this cost with optimization, developers are loath to use method invocation for prosaic idioms; gets in their way.

      Even so, a straightforward approach for line wrapping long string literals would be to simply replace the newlines with spaces or empty string.

        String text = """
                      Lorem ipsum dolor sit amet, consectetur adipiscing
                      elit, sed do eiusmod tempor incididunt ut labore
                      et dolore magna aliqua.""".replace('\n', ' ');

      or

        String text = """
                      Lorem ipsum dolor sit amet, consectetur adipiscing
                       elit, sed do eiusmod tempor incididunt ut labore
                       et dolore magna aliqua.
                      """.replace("\n", "");

      Either approach may get the desired result, but is still encumbered with runtime cost and the need for an explicit call. As well, we have little control over line terminator retention or trailing white space stripping.

      Another approach is to use a visible fence sequence, such as $ or .... The fence sequence, in combination with the line terminator, does provide control over line terminator retention or trailing white space stripping, but is still encumbered with runtime cost and the need for an explicit call.

        String text = """
                      Lorem ipsum dolor sit amet, consectetur adipiscing $
                      elit, sed do eiusmod tempor incididunt ut labore $
                      et dolore magna aliqua.""".replace("$.\n", "");

      or

        String text = """
                      Lorem ipsum dolor sit amet, consectetur adipiscing ...
                      elit, sed do eiusmod tempor incididunt ut labore ...
                      et dolore magna aliqua.""".replace("...\n", "");

      Observable Space

      Observable space could be downplayed as an aesthetic change but we feel that \s provides significant code clarity. The following examples equivalently represent five spaces:

      "     "
      "\040\040\040\040\040"
      "\u0020\u0020\u0020\u0020\u0020"
      "\s\s\s\s\s"

      Other escape sequences were evaluated. Most other sequences don't add value and the association of \s to space is clear (as \t is for tab).

      \<space> was also considered.

      "\ \ \ \ \ "

      While aesthetically acceptable, the interpretation of \<space> at the end of line becomes perplexing. Is this a \<space> or a \␊? Using the observable character s removes any ambiguity.

      Alternate character sequence fences can be used for trailing white space retention, but they incur a runtime cost when stripped away.

      String colors = """
              red   $
              green $
              blue  $
              """.replace("$\n", "\n");

      It is also possible to change the text block rules to not remove incidental white space. However, the strong argument for removing incidental white space remains.

      Testing

      Tests will be added to test various permutations of the new escape sequences, along with testing interaction with existing escape sequences.

      Risks and Assumptions

      The primary risk is that there may be tools in the field that cannot be modified to accept these new escape sequences.

      Dependencies

      Dependent on JEP 355 - Text Blocks (Preview) moving forward.

      Attachments

        Activity

          People

            jlaskey Jim Laskey
            jlaskey Jim Laskey
            Jim Laskey Jim Laskey
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: