Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8213167

provide Alpha-Decimal Comparator

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • core-libs
    • None
    • source
    • minimal
    • Java API
    • JDK

      Summary

      Add new API for obtaining an instance of Comparator, which will be used to compare CharSequences, taking into account numeric value of decimal digits embedded into them.

      Problem

      Often, users need to compare strings, portions of which consist of decimal digits. These portions should be compared as numeric values instead of as strings. For example, the string "abc2" should be considered less than a string "abc10" because 2 is less than 10.

      Solution

      New methods for obtaining an instance of a Comparator will be provided. These methods allow composition with an existing Comparator object, so non-decimal parts of the string can be compared in a desired way.

      There will be two methods provided. The produced Comparators will differ in the way they treat leading zeros in the decimal parts of the compared strings.

      Specification

      The following two public static methods will be added to the class java.util.Comparator:

      +
      +    /**
      +     * The returned comparator compares two character sequences as though each
      +     * of them would be first transformed into a tuple of the form:
      +     * <pre>{@code (A0, N0, A1, N1, ..., An-1, Nn-1, An, Nn)}</pre>
      +     * where:
      +     * <p>{@code A0} and {@code An} are (possibly empty) sub-sequences
      +     * consisting of non-decimal-digit characters,
      +     * <p>{@code A1 ... An-1} are non-empty sub-sequences consisting of
      +     * non-decimal-digit characters,
      +     * <p>{@code N0 ... Nn-1} are non-empty sub-sequences consisting of
      +     * decimal-digit characters, and
      +     * <p>{@code Nn} is a (possibly empty) sub-sequence consisting of
      +     * decimal-digit characters.
      +     *
      +     * <p>All sub-sequences concatenated together in order as they appear in the
      +     * tuple yield the original character sequence.
      +     *
      +     * After transformation, the tuples are compared by their elements (from
      +     * left to right) so that corresponding {@code Ax} elements are compared
      +     * using the provided comparator {@code alphaComparator} and {@code Nx}
      +     * elements are compared as non negative decimal integers.
      +     *
      +     * The first pair of compared elements that is different with respect to the
      +     * used comparator (either {@code alphaComparator}, or special decimal
      +     * comparator) if any, provides the result produced by this comparator.
      +     * The arguments are treated equal, if and only if all the subsequences,
      +     * both decimal and non-decimal, compare equal.
      +     *
      +     * <p>For example, the following array was sorted using such comparator:
      +     * <pre>{@code
      +     * { "1ab", "5ab", "10ab",
      +     *   "a1b", "a5b", "a10b",
      +     *   "ab1", "ab5", "ab10" };}</pre>
      +     *
      +     * <p>When comparing numerical parts, an empty character sequence is
      +     * considered less than any non-empty sequence of decimal digits.
      +     *
      +     * <p>If the numeric values of two compared character sub-sequences are
      +     * equal, but their string representations have different number of leading
      +     * zeroes, the comparator treats the number with less leading zeros as
      +     * smaller.
      +     * For example, {@code "abc 1" < "abc 01" < "abc 001"}.
      +     *
      +     * @apiNote  For example, to sort a collection of {@code String} based on
      +     * case-insensitive ordering, and treating numbers with more leading
      +     * zeroes as greater, one could use
      +     *
      +     * <pre>{@code
      +     *     Comparator<String> cmp = Comparator.comparingAlphaDecimal(
      +     *             Comparator.comparing(CharSequence::toString,
      +     *                                  String::compareToIgnoreCase));
      +     * }</pre>
      +     *
      +     * @implSpec  To test if the given code point represents a decimal digit,
      +     * the comparator checks if {@link java.lang.Character#getType(int)}
      +     * returns value {@link java.lang.Character#DECIMAL_DIGIT_NUMBER}.
      +     * The comparator uses {@link java.lang.Character#digit(int, int)} with
      +     * the second argument set to {@code 10} to determine the numeric
      +     * value of a digit represented by the given code point.
      +     *
      +     * @param  alphaComparator the comparator that compares sub-sequences
      +     *                         consisting of non-decimal-digits
      +     * @param  <T> the type of elements to be compared; normally
      +     *                         {@link java.lang.CharSequence}
      +     * @return a comparator that compares character sequences, following the
      +     *                         rules described above
      +     * @throws NullPointerException if the argument is null
      +     *
      +     * @since 12
      +     */
      +    public static <T extends CharSequence> Comparator<T>
      +    comparingAlphaDecimal(Comparator<? super CharSequence> alphaComparator) {
      +        return new Comparators.AlphaDecimalComparator<>(
      +                Objects.requireNonNull(alphaComparator), false);
      +    }
      +
      +    /**
      +     * The returned comparator compares two character sequences as though each
      +     * of them would be first transformed into a tuple of the form:
      +     * <pre>{@code (A0, N0, A1, N1, ..., An-1, Nn-1, An, Nn)}</pre>
      +     * where:
      +     * <p>{@code A0} and {@code An} are (possibly empty) sub-sequences
      +     * consisting of non-decimal-digit characters,
      +     * <p>{@code A1 ... An-1} are non-empty sub-sequences consisting of
      +     * non-decimal-digit characters,
      +     * <p>{@code N0 ... Nn-1} are non-empty sub-sequences consisting of
      +     * decimal-digit characters, and
      +     * <p>{@code Nn} is a (possibly empty) sub-sequence consisting of
      +     * decimal-digit characters.
      +     *
      +     * <p>All sub-sequences concatenated together in order as they appear in the
      +     * tuple yield the original character sequence.
      +     *
      +     * After transformation, the tuples are compared by their elements (from
      +     * left to right) so that corresponding {@code Ax} elements are compared
      +     * using the provided comparator {@code alphaComparator} and {@code Nx}
      +     * elements are compared as non negative decimal integers.
      +     *
      +     * The first pair of compared elements that is different with respect to the
      +     * used comparator (either {@code alphaComparator}, or special decimal
      +     * comparator) if any, provides the result produced by this comparator.
      +     * The arguments are treated equal, if and only if all the subsequences,
      +     * both decimal and non-decimal, compare equal.
      +     *
      +     * <p>For example, the following array was sorted using such comparator:
      +     * <pre>{@code
      +     * { "1ab", "5ab", "10ab",
      +     *   "a1b", "a5b", "a10b",
      +     *   "ab1", "ab5", "ab10" };}</pre>
      +     *
      +     * <p>When comparing numerical parts, an empty character sequence is
      +     * considered less than any non-empty sequence of decimal digits.
      +     *
      +     * <p>If the numeric values of two compared character sub-sequences are
      +     * equal, but their string representations have different number of leading
      +     * zeroes, the comparator treats the number with more leading zeros as
      +     * smaller.
      +     * For example, {@code "abc 001" < "abc 01" < "abc 1"}.
      +     *
      +     * @apiNote  For example, to sort a collection of {@code String} based on
      +     * case-insensitive ordering, and treating numbers with less leading
      +     * zeroes as greater, one could use
      +     *
      +     * <pre>{@code
      +     *       Comparator<String> cmp = Comparator.comparingAlphaDecimalLeadingZeroesFirst(
      +     *             Comparator.comparing(CharSequence::toString,
      +     *                                  String::compareToIgnoreCase));
      +     * }</pre>
      +     *
      +     * @implSpec  To test if the given code point represents a decimal digit,
      +     * the comparator checks if {@link java.lang.Character#getType(int)}
      +     * returns value {@link java.lang.Character#DECIMAL_DIGIT_NUMBER}.
      +     * The comparator uses {@link java.lang.Character#digit(int, int)} with
      +     * the second argument set to {@code 10} to determine the numeric
      +     * value of a digit represented by the given code point.
      +     *
      +     * @param  alphaComparator the comparator that compares sub-sequences
      +     *                         consisting of non-decimal-digits
      +     * @param  <T> the type of elements to be compared; normally
      +     *                         {@link java.lang.CharSequence}
      +     * @return a comparator that compares character sequences, following the
      +     *                         rules described above
      +     * @throws NullPointerException if the argument is null
      +     *
      +     * @since 12
      +     */
      +    public static <T extends CharSequence> Comparator<T>
      +    comparingAlphaDecimalLeadingZeroesFirst(
      +            Comparator<? super CharSequence> alphaComparator) {
      +        return new Comparators.AlphaDecimalComparator<>(
      +                Objects.requireNonNull(alphaComparator), true);
      +    }

            Unassigned Unassigned
            igerasim Ivan Gerasimov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: