Details

CSR

Status: Draft

P4

Resolution: Unresolved

None

source

minimal

Java API

JDK
Description
Summary
Add new API for obtaining an instance of Comparator, which will be used to compare CharSequences, taking into account numeric value of decimal digits embedded into them.
Problem
Often, users need to compare strings, portions of which consist of decimal digits. These portions should be compared as numeric values instead of as strings. For example, the string "abc2" should be considered less than a string "abc10" because 2 is less than 10.
Solution
New methods for obtaining an instance of a Comparator will be provided. These methods allow composition with an existing Comparator object, so nondecimal parts of the string can be compared in a desired way.
There will be two methods provided. The produced Comparators will differ in the way they treat leading zeros in the decimal parts of the compared strings.
Specification
The following two public static methods will be added to the class java.util.Comparator:
+
+ /**
+ * The returned comparator compares two character sequences as though each
+ * of them would be first transformed into a tuple of the form:
+ * <pre>{@code (A0, N0, A1, N1, ..., An1, Nn1, An, Nn)}</pre>
+ * where:
+ * <p>{@code A0} and {@code An} are (possibly empty) subsequences
+ * consisting of nondecimaldigit characters,
+ * <p>{@code A1 ... An1} are nonempty subsequences consisting of
+ * nondecimaldigit characters,
+ * <p>{@code N0 ... Nn1} are nonempty subsequences consisting of
+ * decimaldigit characters, and
+ * <p>{@code Nn} is a (possibly empty) subsequence consisting of
+ * decimaldigit characters.
+ *
+ * <p>All subsequences concatenated together in order as they appear in the
+ * tuple yield the original character sequence.
+ *
+ * After transformation, the tuples are compared by their elements (from
+ * left to right) so that corresponding {@code Ax} elements are compared
+ * using the provided comparator {@code alphaComparator} and {@code Nx}
+ * elements are compared as non negative decimal integers.
+ *
+ * The first pair of compared elements that is different with respect to the
+ * used comparator (either {@code alphaComparator}, or special decimal
+ * comparator) if any, provides the result produced by this comparator.
+ * The arguments are treated equal, if and only if all the subsequences,
+ * both decimal and nondecimal, compare equal.
+ *
+ * <p>For example, the following array was sorted using such comparator:
+ * <pre>{@code
+ * { "1ab", "5ab", "10ab",
+ * "a1b", "a5b", "a10b",
+ * "ab1", "ab5", "ab10" };}</pre>
+ *
+ * <p>When comparing numerical parts, an empty character sequence is
+ * considered less than any nonempty sequence of decimal digits.
+ *
+ * <p>If the numeric values of two compared character subsequences are
+ * equal, but their string representations have different number of leading
+ * zeroes, the comparator treats the number with less leading zeros as
+ * smaller.
+ * For example, {@code "abc 1" < "abc 01" < "abc 001"}.
+ *
+ * @apiNote For example, to sort a collection of {@code String} based on
+ * caseinsensitive ordering, and treating numbers with more leading
+ * zeroes as greater, one could use
+ *
+ * <pre>{@code
+ * Comparator<String> cmp = Comparator.comparingAlphaDecimal(
+ * Comparator.comparing(CharSequence::toString,
+ * String::compareToIgnoreCase));
+ * }</pre>
+ *
+ * @implSpec To test if the given code point represents a decimal digit,
+ * the comparator checks if {@link java.lang.Character#getType(int)}
+ * returns value {@link java.lang.Character#DECIMAL_DIGIT_NUMBER}.
+ * The comparator uses {@link java.lang.Character#digit(int, int)} with
+ * the second argument set to {@code 10} to determine the numeric
+ * value of a digit represented by the given code point.
+ *
+ * @param alphaComparator the comparator that compares subsequences
+ * consisting of nondecimaldigits
+ * @param <T> the type of elements to be compared; normally
+ * {@link java.lang.CharSequence}
+ * @return a comparator that compares character sequences, following the
+ * rules described above
+ * @throws NullPointerException if the argument is null
+ *
+ * @since 12
+ */
+ public static <T extends CharSequence> Comparator<T>
+ comparingAlphaDecimal(Comparator<? super CharSequence> alphaComparator) {
+ return new Comparators.AlphaDecimalComparator<>(
+ Objects.requireNonNull(alphaComparator), false);
+ }
+
+ /**
+ * The returned comparator compares two character sequences as though each
+ * of them would be first transformed into a tuple of the form:
+ * <pre>{@code (A0, N0, A1, N1, ..., An1, Nn1, An, Nn)}</pre>
+ * where:
+ * <p>{@code A0} and {@code An} are (possibly empty) subsequences
+ * consisting of nondecimaldigit characters,
+ * <p>{@code A1 ... An1} are nonempty subsequences consisting of
+ * nondecimaldigit characters,
+ * <p>{@code N0 ... Nn1} are nonempty subsequences consisting of
+ * decimaldigit characters, and
+ * <p>{@code Nn} is a (possibly empty) subsequence consisting of
+ * decimaldigit characters.
+ *
+ * <p>All subsequences concatenated together in order as they appear in the
+ * tuple yield the original character sequence.
+ *
+ * After transformation, the tuples are compared by their elements (from
+ * left to right) so that corresponding {@code Ax} elements are compared
+ * using the provided comparator {@code alphaComparator} and {@code Nx}
+ * elements are compared as non negative decimal integers.
+ *
+ * The first pair of compared elements that is different with respect to the
+ * used comparator (either {@code alphaComparator}, or special decimal
+ * comparator) if any, provides the result produced by this comparator.
+ * The arguments are treated equal, if and only if all the subsequences,
+ * both decimal and nondecimal, compare equal.
+ *
+ * <p>For example, the following array was sorted using such comparator:
+ * <pre>{@code
+ * { "1ab", "5ab", "10ab",
+ * "a1b", "a5b", "a10b",
+ * "ab1", "ab5", "ab10" };}</pre>
+ *
+ * <p>When comparing numerical parts, an empty character sequence is
+ * considered less than any nonempty sequence of decimal digits.
+ *
+ * <p>If the numeric values of two compared character subsequences are
+ * equal, but their string representations have different number of leading
+ * zeroes, the comparator treats the number with more leading zeros as
+ * smaller.
+ * For example, {@code "abc 001" < "abc 01" < "abc 1"}.
+ *
+ * @apiNote For example, to sort a collection of {@code String} based on
+ * caseinsensitive ordering, and treating numbers with less leading
+ * zeroes as greater, one could use
+ *
+ * <pre>{@code
+ * Comparator<String> cmp = Comparator.comparingAlphaDecimalLeadingZeroesFirst(
+ * Comparator.comparing(CharSequence::toString,
+ * String::compareToIgnoreCase));
+ * }</pre>
+ *
+ * @implSpec To test if the given code point represents a decimal digit,
+ * the comparator checks if {@link java.lang.Character#getType(int)}
+ * returns value {@link java.lang.Character#DECIMAL_DIGIT_NUMBER}.
+ * The comparator uses {@link java.lang.Character#digit(int, int)} with
+ * the second argument set to {@code 10} to determine the numeric
+ * value of a digit represented by the given code point.
+ *
+ * @param alphaComparator the comparator that compares subsequences
+ * consisting of nondecimaldigits
+ * @param <T> the type of elements to be compared; normally
+ * {@link java.lang.CharSequence}
+ * @return a comparator that compares character sequences, following the
+ * rules described above
+ * @throws NullPointerException if the argument is null
+ *
+ * @since 12
+ */
+ public static <T extends CharSequence> Comparator<T>
+ comparingAlphaDecimalLeadingZeroesFirst(
+ Comparator<? super CharSequence> alphaComparator) {
+ return new Comparators.AlphaDecimalComparator<>(
+ Objects.requireNonNull(alphaComparator), true);
+ }
Attachments
Issue Links
 csr of

JDK8134512 provide AlphaDecimal Comparator
 Open