Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8305488

Add split() variants that keep the delimiters to String and j.u.r.Pattern

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 21
    • core-libs
    • None
    • minimal
    • The API points are additions in final classes, so there are neither source nor binary compatibility risks.
    • Java API
    • SE

      Summary

      Currently, String.split() and java.util.regex.Pattern.split() only return the substrings between the delimiters matching the given regex.

      It is proposed to add API points that return an alternation between the substrings and the matched delimiters.

      Problem

      Providing this functionality outside the JDK is somewhat more complicated than in the JDK itself.

      Solution

      Adding similar API points to both String and java.util.regex.Pattern is relatively straightforward.

      Specification

      Class java.lang.String

      +    /**
      +     * Splits this string around matches of the given regular expression and
      +     * returns both the strings and the matching delimiters.
      +     *
      +     * <p> The array returned by this method contains each substring of this
      +     * string that is terminated by another substring that matches the given
      +     * expression or is terminated by the end of the string.
      +     * Each substring is immediately followed by the subsequence (the delimiter)
      +     * that matches the given expression, <em>except</em> for the last
      +     * substring, which is not followed by anything.
      +     * The substrings in the array and the delimiters are in the order in which
      +     * they occur in the input.
      +     * If the expression does not match any part of the input then the resulting
      +     * array has just one element, namely this string.
      +     *
      +     * <p> When there is a positive-width match at the beginning of this
      +     * string then an empty leading substring is included at the beginning
      +     * of the resulting array. A zero-width match at the beginning however
      +     * never produces such empty leading substring nor the empty delimiter.
      +     *
      +     * <p> The {@code limit} parameter controls the number of times the
      +     * pattern is applied and therefore affects the length of the resulting
      +     * array.
      +     * <ul>
      +     *    <li> If the <i>limit</i> is positive then the pattern will be applied
      +     *    at most <i>limit</i>&nbsp;-&nbsp;1 times, the array's length will be
      +     *    no greater than 2 &centerdot; <i>limit</i> - 1, and the array's last
      +     *    entry will contain all input beyond the last matched delimiter.</li>
      +     *
      +     *    <li> If the <i>limit</i> is zero then the pattern will be applied as
      +     *    many times as possible, the array can have any length, and trailing
      +     *    empty strings will be discarded.</li>
      +     *
      +     *    <li> If the <i>limit</i> is negative then the pattern will be applied
      +     *    as many times as possible and the array can have any length.</li>
      +     * </ul>
      +     *
      +     * <p> The input {@code "boo:::and::foo"}, for example, yields the following
      +     * results with these parameters:
      +     *
      +     * <table class="plain" style="margin-left:2em;">
      +     * <caption style="display:none">Split example showing regex, limit, and result</caption>
      +     * <thead>
      +     * <tr>
      +     *     <th scope="col">Regex</th>
      +     *     <th scope="col">Limit</th>
      +     *     <th scope="col">Result</th>
      +     * </tr>
      +     * </thead>
      +     * <tbody>
      +     * <tr><th scope="row" rowspan="3" style="font-weight:normal">:+</th>
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">2</th>
      +     *     <td>{@code { "boo", ":::", "and::foo" }}</td></tr>
      +     * <tr><!-- : -->
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">5</th>
      +     *     <td>{@code { "boo", ":::", "and", "::", "foo" }}</td></tr>
      +     * <tr><!-- : -->
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">-1</th>
      +     *     <td>{@code { "boo", ":::", "and", "::", "foo" }}</td></tr>
      +     * <tr><th scope="row" rowspan="3" style="font-weight:normal">o</th>
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">5</th>
      +     *     <td>{@code { "b", "o", "", "o", ":::and::f", "o", "", "o", "" }}</td></tr>
      +     * <tr><!-- o -->
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">-1</th>
      +     *     <td>{@code { "b", "o", "", "o", ":::and::f", "o", "", "o", "" }}</td></tr>
      +     * <tr><!-- o -->
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">0</th>
      +     *     <td>{@code { "b", "o", "", "o", ":::and::f", "o", "", "o" }}</td></tr>
      +     * </tbody>
      +     * </table>
      +     *
      +     * @apiNote An invocation of this method of the form
      +     * <i>str.</i>{@code splitWithDelimiters(}<i>regex</i>{@code ,}&nbsp;<i>n</i>{@code )}
      +     * yields the same result as the expression
      +     *
      +     * <blockquote>
      +     * {@link java.util.regex.Pattern}.{@link
      +     * java.util.regex.Pattern#compile(String) compile}(<i>regex</i>).{@link
      +     * java.util.regex.Pattern#splitWithDelimiters(CharSequence,int) splitWithDelimiters}(<i>str</i>,&nbsp;<i>n</i>)
      +     * </code>
      +     * </blockquote>
      +     *
      +     * @param  regex
      +     *         the delimiting regular expression
      +     *
      +     * @param  limit
      +     *         the result threshold, as described above
      +     *
      +     * @return  the array of strings computed by splitting this string
      +     *          around matches of the given regular expression, alternating
      +     *          substrings and matching delimiters
      +     *
      +     * @since   21
      +     */
      +    public String[] splitWithDelimiters(String regex, int limit) {

      Class java.util.regex.Pattern

      +    /**
      +     * Splits the given input sequence around matches of this pattern and
      +     * returns both the strings and the matching delimiters.
      +     *
      +     * <p> The array returned by this method contains each substring of the
      +     * input sequence that is terminated by another subsequence that matches
      +     * this pattern or is terminated by the end of the input sequence.
      +     * Each substring is immediately followed by the subsequence (the delimiter)
      +     * that matches this pattern, <em>except</em> for the last substring, which
      +     * is not followed by anything.
      +     * The substrings in the array and the delimiters are in the order in which
      +     * they occur in the input.
      +     * If this pattern does not match any subsequence of the input then the
      +     * resulting array has just one element, namely the input sequence in string
      +     * form.
      +     *
      +     * <p> When there is a positive-width match at the beginning of the input
      +     * sequence then an empty leading substring is included at the beginning
      +     * of the resulting array.
      +     * A zero-width match at the beginning however never produces such empty
      +     * leading substring nor the empty delimiter.
      +     *
      +     * <p> The {@code limit} parameter controls the number of times the
      +     * pattern is applied and therefore affects the length of the resulting
      +     * array.
      +     * <ul>
      +     *    <li> If the <i>limit</i> is positive then the pattern will be applied
      +     *    at most <i>limit</i> - 1 times, the array's length will be
      +     *    no greater than 2 &centerdot; <i>limit</i> - 1, and the array's last
      +     *    entry will contain all input beyond the last matched delimiter.</li>
      +     *
      +     *    <li> If the <i>limit</i> is zero then the pattern will be applied as
      +     *    many times as possible, the array can have any length, and trailing
      +     *    empty strings, whether substrings or delimiters, will be discarded.</li>
      +     *
      +     *    <li> If the <i>limit</i> is negative then the pattern will be applied
      +     *    as many times as possible and the array can have any length.</li>
      +     * </ul>
      +     *
      +     * <p> The input {@code "boo:::and::foo"}, for example, yields the following
      +     * results with these parameters:
      +     *
      +     * <table class="plain" style="margin-left:2em;">
      +     * <caption style="display:none">Split example showing regex, limit, and result</caption>
      +     * <thead>
      +     * <tr>
      +     *     <th scope="col">Regex</th>
      +     *     <th scope="col">Limit</th>
      +     *     <th scope="col">Result</th>
      +     * </tr>
      +     * </thead>
      +     * <tbody>
      +     * <tr><th scope="row" rowspan="3" style="font-weight:normal">:+</th>
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">2</th>
      +     *     <td>{@code { "boo", ":::", "and::foo" }}</td></tr>
      +     * <tr><!-- : -->
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">5</th>
      +     *     <td>{@code { "boo", ":::", "and", "::", "foo" }}</td></tr>
      +     * <tr><!-- : -->
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">-1</th>
      +     *     <td>{@code { "boo", ":::", "and", "::", "foo" }}</td></tr>
      +     * <tr><th scope="row" rowspan="3" style="font-weight:normal">o</th>
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">5</th>
      +     *     <td>{@code { "b", "o", "", "o", ":::and::f", "o", "", "o", "" }}</td></tr>
      +     * <tr><!-- o -->
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">-1</th>
      +     *     <td>{@code { "b", "o", "", "o", ":::and::f", "o", "", "o", "" }}</td></tr>
      +     * <tr><!-- o -->
      +     *     <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">0</th>
      +     *     <td>{@code { "b", "o", "", "o", ":::and::f", "o", "", "o" }}</td></tr>
      +     * </tbody>
      +     * </table>
      +     *
      +     * @param  input
      +     *         The character sequence to be split
      +     *
      +     * @param  limit
      +     *         The result threshold, as described above
      +     *
      +     * @return  The array of strings computed by splitting the input
      +     *          around matches of this pattern, alternating
      +     *          substrings and matching delimiters
      +     *
      +     * @since   21
      +     */
      +    public String[] splitWithDelimiters(CharSequence input, int limit) {

            rgiulietti Raffaello Giulietti
            rgiulietti Raffaello Giulietti
            Roger Riggs
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: