Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8223776

String::stripIndent (Preview)

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P3 P3
    • 13
    • core-libs
    • None
    • minimal
    • New method in a final class.
    • Java API
    • SE

      Summary

      This feature introduces a new String instance method String::stripIndent, used to remove incidental white space introduced by incidental indentation of a Text Block content.

      This method is part of a preview language feature: Text Blocks

      Problem

      Text blocks are easier to read than their concatenated string literal counterparts, but the "obvious" interpretation of a text block would include the spaces added to indent the embedded string so that it lines up neatly with the opening delimiter and/or enclosing code. Consequently, each text block would not represent the same string as the concatenated string literals, hurting migration, and if the developer were to re-indent the code using the IDE, it would change the contents of the text block. The following HTML example uses dots to visualize the spaces that the developer added for indentation, but did not intend to be part of the content:

          String html = """
          ..............<html>
          ..............    <body>
          ..............        <p>Hello World.</p>
          ..............    </body>
          ..............</html>
          ..............""";

      Accordingly, a better interpretation of a text block is to differentiate incidental white space from essential white space. The proposed string method would "re-indent" the content by removing the incidental white space (the dots above), to yield what the developer intended: (using | to visualize the left margin)

          |<html>
          |    <body>
          |        <p>Hello World.</p>
          |    </body>
          |</html>

      Solution

      The re-indentation algorithm takes a text block and removes the same amount of white space from each line of content until at least one of the lines has a non-white space character in the leftmost position. The algorithm is as follows:

      1. Split the content of the multi-line string at every line terminator (LF, CR and CRLF), producing a list of individual lines. Note that any line in the content which was just an line terminator will become an empty line in the list of individual lines.

      2. Add all non-blank lines from the list of individual lines into a set of determining lines. (Blank lines -- lines that are empty or are composed wholly of white space -- have no visible influence on the indentation. Excluding blank lines from the set of determining lines avoids throwing off step 4 of the algorithm.)

      3. If the last line in the list of individual lines (i.e., the line with the text block closing delimiter) is blank, then add it to the set of determining lines. (The indentation of the closing delimiter should influence the indentation of the content as a whole -- a "significant trailing line" policy.)

      4. Compute the common white space prefix of the set of determining lines, by counting the number of leading white space characters on each line and taking the minimum count.

      5. Remove the common white space prefix from each non-blank line in the list of individual lines.

      6. Remove all trailing white space from all lines in the modified list of individual lines from step 5. ("Hidden" white space at the end of lines is unintentional, so it is overwhelmingly likely that the developer does not want it in the string.) Note that this step collapses wholly-white space lines in the modified list so that they are empty, but does not discard them.

      7. Construct the result string by joining all the lines in the modified list of individual lines from step 6, using LF as the separator between lines. If the final line in the list from step 6 is empty, then the joining LF from the previous line will be the last character in the result string.

      This re-indentation algorithm will be referenced in normative text by the new JLS section for text blocks (see http://cr.openjdk.java.net/~abuckley/jep355/text-blocks-jls.html). In other words, the JLS will logically incorporate the API spec of String::stripIndent, but will not physically incorporate it.

      Specification

          /**
           * Returns a string whose value is this string, with incidental
           * {@linkplain Character#isWhitespace(int) white space} removed from
           * the beginning and end of every line.
           * <p>
           * Incidental {@linkplain Character#isWhitespace(int) white space}
           * is often present in a text block to align the content with the opening
           * delimiter. For example, in the following code, dots represent incidental
           * {@linkplain Character#isWhitespace(int) white space}:
           * <blockquote><pre>
           * String html = """
           * ..............&lt;html&gt;
           * ..............    &lt;body&gt;
           * ..............        &lt;p&gt;Hello, world&lt;/p&gt;
           * ..............    &lt;/body&gt;
           * ..............&lt;/html&gt;
           * ..............""";
           * </pre></blockquote>
           * This method treats the incidental
           * {@linkplain Character#isWhitespace(int) white space} as indentation to be
           * stripped, producing a string that preserves the relative indentation of
           * the content. Using | to visualize the start of each line of the string:
           * <blockquote><pre>
           * |&lt;html&gt;
           * |    &lt;body&gt;
           * |        &lt;p&gt;Hello, world&lt;/p&gt;
           * |    &lt;/body&gt;
           * |&lt;/html&gt;
           * </pre></blockquote>
           * First, the individual lines of this string are extracted as if by using
           * {@link String#lines()}.
           * <p>
           * Then, the <i>minimum indentation</i> (min) is determined as follows.
           * For each non-blank line (as defined by {@link String#isBlank()}), the
           * leading {@linkplain Character#isWhitespace(int) white space} characters are
           * counted. The leading {@linkplain Character#isWhitespace(int) white space}
           * characters on the last line are also counted even if
           * {@linkplain String#isBlank() blank}. The <i>min</i> value is the smallest
           * of these counts.
           * <p>
           * For each {@linkplain String#isBlank() non-blank} line, <i>min</i> leading
           * {@linkplain Character#isWhitespace(int) white space} characters are removed,
           * and any trailing {@linkplain Character#isWhitespace(int) white space}
           * characters are removed. {@linkplain String#isBlank() Blank} lines are
           * replaced with the empty string.
           *
           * <p>
           * Finally, the lines are joined into a new string, using the LF character
           * {@code "\n"} (U+000A) to separate lines.
           *
           * @apiNote
           * This method's primary purpose is to shift a block of lines as far as
           * possible to the left, while preserving relative indentation. Lines
           * that were indented the least will thus have no leading
           * {@linkplain Character#isWhitespace(int) white space}.
           * The line count of the result will be the same as line count of this
           * string.
           * If this string ends with a line terminator then the result will end
           * with a line terminator.
           *
           * @implNote
           * This method treats all {@linkplain Character#isWhitespace(int) white space}
           * characters as having equal width. As long as the indentation on every
           * line is consistently composed of the same character sequences, then the
           * result will be as described above.
           *
           * @return string with incidental indentation removed and line
           *         terminators normalized
           *
           * @see String#lines()
           * @see String#isBlank()
           * @see String#indent(int)
           * @see Character#isWhitespace(int)
           *
           * @since 13
           *
           * @deprecated  This method is associated with text blocks, a preview language feature.
           *              Text blocks and/or this method may be changed or removed in a future release.
           */
          @Deprecated(forRemoval=true, since="13")
          public String stripIndent() {

            jlaskey Jim Laskey
            jlaskey Jim Laskey
            Alex Buckley, Stuart Marks
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: