Add the following methods to java.util.Arrays:
sort(int[] a, IntBinaryOperator cmp)
sort(int[] a, int fromIndex, int toIndex, IntBinaryOperator cmp)
parallelSort(int[] a, IntBinaryOperator cmp)
parallelSort(int[] a, int fromIndex, int toIndex, IntBinaryOperator cmp)
sort(long[] a, LongBinaryOperator cmp)
sort(long[] a, int fromIndex, int toIndex, LongBinaryOperator cmp)
parallelSort(long[] a, LongBinaryOperator cmp)
parallelSort(long[] a, int fromIndex, int toIndex, LongBinaryOperator cmp)
...where the new optional comparator argument is used instead of the natural comparison operation on array elements. The contract for the operator is the same as for Integer::compare or Long::compare, and so the natural comparison would also be obtained by passing one of those method references.
Use cases:
1. Support unsigned ordering of ints and longs, using Long::unsignedCompare. (This use case extends to other primitive types, not covered by this RFE.)
2. Permit index-wise sorting algorithms, not otherwise efficiently obtainable. A sorting task on an arbitrary int-indexed (or long-indexed) collection (or array or memory segment) can be decomposed into a sort on an iota-array (built as a temporary) which is interpreted as indexes into the other collection. The comparator would take two indexes (int or long) and interpret them, not as primitive numbers, but rather as references to the original collection (or array or memory segment). The indexes would be sorted according to whatever order was required on the elements of the collection (or array or memory segment). The result of the sort operation would be read from the temporary iota array, now a permutation array which reflects the sorted order of the original data.
This is especially useful if the original data is organized as flat data larger than a machine order, since swapping two larger flat data items will be more expensive than swapping their indexes (in the temporary iota array). Of course, the extra indirection will add a cost, so there are trade-offs.
If the input array is not an iota array but a different permutation, then the effect of sorting is to further reorder an existing permutation. The output array can be applied, as a reordering, to the original array (perhaps there should be an in-place algorithm for this, as a follow-on RFE). More importantly, the output array can be applied to multiple arrays (of the same length); the effect of this is identical to sorting the rows of a columnar database by first deriving a permutation vector from the key column and then applying the permutation uniformly to all columns. This is an important operation for column-major data.
A long can encode a pair of ints, which in turn can represent a pair of indexes into a list or array. If a long array (used as a temporary) is packed with index-pairs, where one set of indexes is an iota sequence, and the other is arbitrary data, then the effect of the sort is to carry the arbitrary data into a new order which reflects the correct order of the data being sorted. Thus, a previously permuted set of indexes can be re-permuted in one sort operation by sorting a long temporary array with a comparator.
The other use for longs (as alluded above) is to allow sorting of large bulk data by reference, such as the lines of a memory mapped file or a block of C structures mapped into memory by a Panama API. Each long would be an offset (of some sort) into that subject data.
Valhalla may eventually provide functionality to support the above algorithms by means of flat arrays of objects; the objects could be slim wrappers for long values, and the normal comparator API would apply to them (instead of the proposed Integer::compare above). It will be a while before such generic algorithms can perform comparably with hand-specialized primitive array algorithms.
sort(int[] a, IntBinaryOperator cmp)
sort(int[] a, int fromIndex, int toIndex, IntBinaryOperator cmp)
parallelSort(int[] a, IntBinaryOperator cmp)
parallelSort(int[] a, int fromIndex, int toIndex, IntBinaryOperator cmp)
sort(long[] a, LongBinaryOperator cmp)
sort(long[] a, int fromIndex, int toIndex, LongBinaryOperator cmp)
parallelSort(long[] a, LongBinaryOperator cmp)
parallelSort(long[] a, int fromIndex, int toIndex, LongBinaryOperator cmp)
...where the new optional comparator argument is used instead of the natural comparison operation on array elements. The contract for the operator is the same as for Integer::compare or Long::compare, and so the natural comparison would also be obtained by passing one of those method references.
Use cases:
1. Support unsigned ordering of ints and longs, using Long::unsignedCompare. (This use case extends to other primitive types, not covered by this RFE.)
2. Permit index-wise sorting algorithms, not otherwise efficiently obtainable. A sorting task on an arbitrary int-indexed (or long-indexed) collection (or array or memory segment) can be decomposed into a sort on an iota-array (built as a temporary) which is interpreted as indexes into the other collection. The comparator would take two indexes (int or long) and interpret them, not as primitive numbers, but rather as references to the original collection (or array or memory segment). The indexes would be sorted according to whatever order was required on the elements of the collection (or array or memory segment). The result of the sort operation would be read from the temporary iota array, now a permutation array which reflects the sorted order of the original data.
This is especially useful if the original data is organized as flat data larger than a machine order, since swapping two larger flat data items will be more expensive than swapping their indexes (in the temporary iota array). Of course, the extra indirection will add a cost, so there are trade-offs.
If the input array is not an iota array but a different permutation, then the effect of sorting is to further reorder an existing permutation. The output array can be applied, as a reordering, to the original array (perhaps there should be an in-place algorithm for this, as a follow-on RFE). More importantly, the output array can be applied to multiple arrays (of the same length); the effect of this is identical to sorting the rows of a columnar database by first deriving a permutation vector from the key column and then applying the permutation uniformly to all columns. This is an important operation for column-major data.
A long can encode a pair of ints, which in turn can represent a pair of indexes into a list or array. If a long array (used as a temporary) is packed with index-pairs, where one set of indexes is an iota sequence, and the other is arbitrary data, then the effect of the sort is to carry the arbitrary data into a new order which reflects the correct order of the data being sorted. Thus, a previously permuted set of indexes can be re-permuted in one sort operation by sorting a long temporary array with a comparator.
The other use for longs (as alluded above) is to allow sorting of large bulk data by reference, such as the lines of a memory mapped file or a block of C structures mapped into memory by a Panama API. Each long would be an offset (of some sort) into that subject data.
Valhalla may eventually provide functionality to support the above algorithms by means of flat arrays of objects; the objects could be slim wrappers for long values, and the normal comparator API would apply to them (instead of the proposed Integer::compare above). It will be a while before such generic algorithms can perform comparably with hand-specialized primitive array algorithms.
- relates to
-
JDK-8294879 Arrays.sort should be accompanied by permutation vector operations
-
- Open
-