Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8134512

provide Alpha-Decimal Comparator



    • Enhancement
    • Resolution: Unresolved
    • P4
    • tbd
    • None
    • core-libs


      Provide a `Comparator`, which is able to take into account numbers embedded into compared strings.

      * Add an `AlphaDecimalComparator` class, which will be able to compare objects of type `CharSequence`, taking into account their alpha and numeric parts.

      * If the strings are compared literally, default character comparison will be used.

      * An instance of `AlphaDecimalComparator` should be configurable in how to treat numerically equal strings with different number of leading zeroes.

      * No plans to implement locale-specific comparison.

      * The comparator will only be able to recognize alpha and numeric parts of the strings.
      More complex structures (e.g. time, date, currency) will be treated as sequence of digits and non-digits, and compared accordingly.

      * No plans to recognize numbers from non-decimal number systems.

      * No plans to recognize + and - signs of numbers. Neither thousand separator and underscore will be treated as parts of numeric part of the string.

      * No plans to make case insensitive comparator.


      In certain situations the preferred way of comparing strings is a combination of char-comparison with numeric comparison.
      When a list of string is sorted in accordance with such comparison, it often look more natural to human eyes.

      * Similar comparison routine is used in some widely used applications (notably, 'Explorer' under Windows, 'Files' file manager under Linux).

      * MSDN provides the function StrCmpLogicalW(), which is used for similar sort order.

      * We have already hit build numbers above one hundred.
      It would be more natural to place the string b100 right after b99.

      * When Java 10 is released, it would be better to place it after Java 9, and not between Java 1 and Java 2.

      A draft proposal was discussed at core-libs-dev mailing list and seemed to bring at least some interest from the comunity.



      The algorithm for comparison of two strings is as following.
      First, both strings are scanned to find the longest common prefix.
      Then, the strings are analyzed around the point where they are different.
      If the difference between the strings happens to be in the numeric substrings, then they are compared numerically.
      If the numerical values are equal, then leading zeroes are taken into account.


        Issue Links



              Unassigned Unassigned
              igerasim Ivan Gerasimov
              0 Vote for this issue
              1 Start watching this issue