There is (IMHO) a nasty performance bug in the methods
Arrays.sort(Object[], int, int)
Arrays.sort(Object[], int, int, Comparator)
The javadoc says
The sorting algorithm is a modified mergesort (in which the merge is omitted if the highest
element in the low sublist is less than the lowest element in the high sublist). This
algorithm offers guaranteed n*log(n) performance, and can approach linear performance
on nearly sorted lists.
Unfortunately, the "n" in this specification is undefined. The most
obvious interpretation is that it would be the size of the subregion
of the array being sorted, that is, the difference between "toIndex"
and "fromIndex". Unfortunately, the algorithmic complexity of the
implementation has a term that is linear in the size of the input
array, because it does a "clone" operation! (I can tell from looking
at the source, but a non-Sun person could tell by observing that the
GC time differs wildly for sorting equal-sized subarrays of small and
large arrays.)
This is a quite pernicious problem -- a user can use this according to
a reasonable interpretation of the spec (which *does* claim to specify
its performance!) and get totally anomolous performance results,
unrelated to code they can modify. For example, below I attach a
program that exercises the bug (derived from code I was writing when I
discovered this) where I just measured the following executions:
trash 72 =>time java -ms24m -XX:-PrintGCDetails SortPerfBug 50 10000 10000 0
5.53u 0.47s 0:06.11 98.1%
trash 73 =>time java -ms24m -XX:-PrintGCDetails SortPerfBug 50 10000 10000 1
1.04u 0.14s 0:01.21 97.5%
The last argument value "1" means use a workaround. I could contrive
an arbitrarily large ratio between the execution times with different
other command lines. Here is the program:
--------------------------------------------------
import java.lang.*;
import java.util.*;
import java.lang.reflect.*;
class SortPerfBug {
private static void rangeCheck(int arrayLen, int fromIndex, int toIndex) {
if (fromIndex > toIndex)
throw new IllegalArgumentException("fromIndex(" + fromIndex +
") > toIndex(" + toIndex+")");
if (fromIndex < 0)
throw new ArrayIndexOutOfBoundsException(fromIndex);
if (toIndex > arrayLen)
throw new ArrayIndexOutOfBoundsException(toIndex);
}
private static Object cloneSubarray(Object a, int from, int to) {
Class c = a.getClass();
if (!c.isArray())
throw new IllegalArgumentException("first argument is not an array.");
int len = Array.getLength(a);
rangeCheck(len, from, to);
Class compClass = c.getComponentType();
int n = to - from;
Object result = Array.newInstance(compClass, n);
System.arraycopy(a, from, result, 0, n);
return result;
}
public static void main(String[] arg) {
int arr_size = Integer.parseInt(arg[0]);
int buf_size = Integer.parseInt(arg[1]);
int n_sorts = Integer.parseInt(arg[2]);
int do_clone = Integer.parseInt(arg[3]);
Integer[] ia = new Integer[arr_size + 2 * buf_size];
for (int i = buf_size; i < buf_size + arr_size; i++) {
ia[i] = new Integer(i);
}
for (int sort_num = 0; sort_num < n_sorts; sort_num++) {
if (do_clone != 0) {
Integer[] ia_tmp = (Integer[])cloneSubarray(ia, buf_size, buf_size+arr_size);
Arrays.sort(ia_tmp, 0, arr_size);
for (int i = 0; i < arr_size; i++) {
ia[buf_size + i] = ia_tmp[i];
}
} else {
Arrays.sort(ia, buf_size, buf_size+arr_size);
}
}
}
}
--------------------------------------------
Arrays.sort(Object[], int, int)
Arrays.sort(Object[], int, int, Comparator)
The javadoc says
The sorting algorithm is a modified mergesort (in which the merge is omitted if the highest
element in the low sublist is less than the lowest element in the high sublist). This
algorithm offers guaranteed n*log(n) performance, and can approach linear performance
on nearly sorted lists.
Unfortunately, the "n" in this specification is undefined. The most
obvious interpretation is that it would be the size of the subregion
of the array being sorted, that is, the difference between "toIndex"
and "fromIndex". Unfortunately, the algorithmic complexity of the
implementation has a term that is linear in the size of the input
array, because it does a "clone" operation! (I can tell from looking
at the source, but a non-Sun person could tell by observing that the
GC time differs wildly for sorting equal-sized subarrays of small and
large arrays.)
This is a quite pernicious problem -- a user can use this according to
a reasonable interpretation of the spec (which *does* claim to specify
its performance!) and get totally anomolous performance results,
unrelated to code they can modify. For example, below I attach a
program that exercises the bug (derived from code I was writing when I
discovered this) where I just measured the following executions:
trash 72 =>time java -ms24m -XX:-PrintGCDetails SortPerfBug 50 10000 10000 0
5.53u 0.47s 0:06.11 98.1%
trash 73 =>time java -ms24m -XX:-PrintGCDetails SortPerfBug 50 10000 10000 1
1.04u 0.14s 0:01.21 97.5%
The last argument value "1" means use a workaround. I could contrive
an arbitrarily large ratio between the execution times with different
other command lines. Here is the program:
--------------------------------------------------
import java.lang.*;
import java.util.*;
import java.lang.reflect.*;
class SortPerfBug {
private static void rangeCheck(int arrayLen, int fromIndex, int toIndex) {
if (fromIndex > toIndex)
throw new IllegalArgumentException("fromIndex(" + fromIndex +
") > toIndex(" + toIndex+")");
if (fromIndex < 0)
throw new ArrayIndexOutOfBoundsException(fromIndex);
if (toIndex > arrayLen)
throw new ArrayIndexOutOfBoundsException(toIndex);
}
private static Object cloneSubarray(Object a, int from, int to) {
Class c = a.getClass();
if (!c.isArray())
throw new IllegalArgumentException("first argument is not an array.");
int len = Array.getLength(a);
rangeCheck(len, from, to);
Class compClass = c.getComponentType();
int n = to - from;
Object result = Array.newInstance(compClass, n);
System.arraycopy(a, from, result, 0, n);
return result;
}
public static void main(String[] arg) {
int arr_size = Integer.parseInt(arg[0]);
int buf_size = Integer.parseInt(arg[1]);
int n_sorts = Integer.parseInt(arg[2]);
int do_clone = Integer.parseInt(arg[3]);
Integer[] ia = new Integer[arr_size + 2 * buf_size];
for (int i = buf_size; i < buf_size + arr_size; i++) {
ia[i] = new Integer(i);
}
for (int sort_num = 0; sort_num < n_sorts; sort_num++) {
if (do_clone != 0) {
Integer[] ia_tmp = (Integer[])cloneSubarray(ia, buf_size, buf_size+arr_size);
Arrays.sort(ia_tmp, 0, arr_size);
for (int i = 0; i < arr_size; i++) {
ia[buf_size + i] = ia_tmp[i];
}
} else {
Arrays.sort(ia, buf_size, buf_size+arr_size);
}
}
}
}
--------------------------------------------