Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8033148

Lexicographic comparators and mismatchers for arrays

XMLWordPrintable

    • b92
    • Verified

        Add lexicographic comparators for arrays. For primitives where appropriate signed and unsigned comparators should included.

        Such comparators can be declared as methods on Arrays. For example, the signatures for byte array comparator methods could be:

            public static int compare(byte[] left, byte[] right);
            public static int compare(byte[] left, int lFromIndex, int lToIndex,
                                      byte[] right, int rFromIndex, int rToIndex);
            public static int compareUnsigned(byte[] left, byte[] right);
            public static int compareUnsigned(byte[] left, int lFromIndex, int lToIndex,
                                              byte[] right, int rFromIndex, int rToIndex);

        A Comparator<byte[]> may be obtained as follows:

            Comparator<byte[]> c = Arrays::compare;
            Comparator<byte[]> cu = Arrays::compareUnsigned;

        For such a proposal a maximum of 28 comparator methods will be required: two methods each for eight primitives, two more each for unsigned variants for byte, short, int and long, and four for object references (with a natural Comparator or an explicit Comparator).

        Frameworks such as Cassandra and libraries such as Guava provide a Comparator for unsigned byte arrays that internally uses sun.misc.Unsafe to view byte[] as long[] to more efficiently compare 8 bytes at a time (assuming alignment). The unsigned byte array comparator implementation should have similar performance characteristics as Cassandra's and Guava's comparator (most likely using the same technique), thereby reducing the dependency on the sun.misc.Unsafe class.

        Customized array equality and comparison can be built upon more general array mismatch functionality, such as:

            public static int mismatch(byte[] a, byte[] b);
            public static int mismatch(byte[] a, int aFromIndex, int aToIndex,
                                       byte[] b, int bFromIndex, int bToIndex);

        A maximum of two methods each for eight primitives, and four for object references () would be required.

              psandoz Paul Sandoz
              psandoz Paul Sandoz
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: