-
Enhancement
-
Resolution: Unresolved
-
P4
-
1.4.2
-
Fix Understood
Name: jl125535 Date: 04/07/2004
FULL PRODUCT VERSION :
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
Java HotSpot(TM) Server VM (build 1.4.2-b28, mixed mode)
java version "1.5.0-beta2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta2-b45)
Java HotSpot(TM) Client VM (build 1.5.0-beta2-b45, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows 2000 [Version 5.00.2195]
A DESCRIPTION OF THE PROBLEM :
Collection.retainAll(Collection) implementations aren't optimized.
When a sufficiently large collection is used as an argument, the
performance is pathologically slow. (See attached program results)
There are several factors that could be taken into account to optimize the
performance of the retainAll(Collection) method. The default implementation
provided by AbstractCollection could, modify its behavior depending on
the size of the specified collection and the cost of its containment test.
The attached code demonstrates a very simple way to avoid the current
pathological worst case (by creating a HashSet). If the cost of constructing an unnecessary HashSet is to be avoided, an interface could be defined indicating that a Collection has an O(1) contains() method.
The retainAll(Collection) method can be viewed as an operation on two collections.
The concrete Collection implementations (HashSet, TreeSet, ArrayList, etc...)
have at their disposal.
- the sizes of both collections
- the element insertion speed of the implementing collection
- the element deletion speed of the implementing collection
- the containment test speed of the implementing collection
Additionally, the following information is likely to be known
- the element insertion speed of the argument collection
- the element deletion speed of the argument collection
- the containment test speed of the argument collection
So, not only could the implementation provided by AbstractCollection be
greatly improved, but the implementations provided by the concrete
implementations could be improved beyond that.
Similar optimizations should be made for the containsAll(Collection) and
removeAll(Collection) methods, too.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Invoke retainAll(collection) on a collection, using an argument with a relatively slow (>O(1)) containment check.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The execution speed of x.retainAll(y) should be no greater than O(x.size() + y.size()).
ACTUAL -
The execution spped of x.retainAll(y) can be as slow as O(x.size() * y.size()).
Partial execution results follow:
java.util.ArrayList took 32 ms for 512
java.util.LinkedList took 0 ms for 512
java.util.HashSet took 0 ms for 512
java.util.TreeSet took 0 ms for 512
Test$FastSet took 16 ms for 512
java.util.ArrayList took 0 ms for 1024
java.util.LinkedList took 0 ms for 1024
java.util.HashSet took 0 ms for 1024
java.util.TreeSet took 0 ms for 1024
Test$FastSet took 0 ms for 1024
java.util.ArrayList took 47 ms for 2048
java.util.LinkedList took 15 ms for 2048
java.util.HashSet took 16 ms for 2048
java.util.TreeSet took 31 ms for 2048
Test$FastSet took 16 ms for 2048
java.util.ArrayList took 63 ms for 4096
java.util.LinkedList took 31 ms for 4096
java.util.HashSet took 47 ms for 4096
java.util.TreeSet took 47 ms for 4096
Test$FastSet took 31 ms for 4096
java.util.ArrayList took 203 ms for 8192
java.util.LinkedList took 156 ms for 8192
java.util.HashSet took 156 ms for 8192
java.util.TreeSet took 141 ms for 8192
Test$FastSet took 0 ms for 8192
java.util.ArrayList took 891 ms for 16384
java.util.LinkedList took 703 ms for 16384
java.util.HashSet took 719 ms for 16384
java.util.TreeSet took 719 ms for 16384
Test$FastSet took 16 ms for 16384
java.util.ArrayList took 11515 ms for 32768
java.util.LinkedList took 11047 ms for 32768
java.util.HashSet took 10031 ms for 32768
java.util.TreeSet took 9281 ms for 32768
Test$FastSet took 62 ms for 32768
java.util.ArrayList took 98015 ms for 65536
java.util.LinkedList took 92234 ms for 65536
java.util.HashSet took 91921 ms for 65536
java.util.TreeSet took 94109 ms for 65536
Test$FastSet took 78 ms for 65536
java.util.ArrayList took 492825 ms for 131072
java.util.LinkedList took 475794 ms for 131072
java.util.HashSet took 476044 ms for 131072
java.util.TreeSet took 473700 ms for 131072
Test$FastSet took 281 ms for 131072
...
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.*;
public class Test {
private static class FastSet extends HashSet {
public FastSet(Collection collection) {
super(collection);
}
/**
* A very easy way to avoid the worst case performance.
* There is still <b>lots</b> of room for improvement here.
*/
public boolean retainAll(Collection collection) {
if (collection.size() > 1000)
collection = new HashSet(collection);
return super.retainAll(collection);
}
} // Fast Set
private static void testRetainAll(Collection full, Collection half) {
int size = full.size();
long start = System.currentTimeMillis();
full.retainAll(half);
long end = System.currentTimeMillis();
long time = end - start;
String name = full.getClass().getName();
System.out.println(name + " took " + time + " ms " + " for " + size);
}
private static void testRetainAll(int size) {
// create a list of size integers
List full = new ArrayList(size);
for (int i=0; i<size; i++)
full.add(new Integer(i));
// create a random list half the size of the full set
List half = new ArrayList(full);
java.util.Collections.shuffle(half);
while (half.size() > full.size() / 2 )
half.remove(0);
testRetainAll(new ArrayList(full),half);
testRetainAll(new LinkedList(full),half);
testRetainAll(new HashSet(full),half);
testRetainAll(new TreeSet(full),half);
testRetainAll(new FastSet(full),half);
}
public static void main(String[] args) {
int x=512;
for (int i=0; i<28; i++) {
testRetainAll(x);
x *= 2;
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Workarounds:
a) Extend HashSet, TreeSet, etc.. to avoid the problem, and use the replacements, instead.
b) Make sure the collection arguments are transformed, when needed. In other words, use something like this:
x.retainAll(new HashSet(y));
c) Write an efficient library method like:
public static Collection retainAll(Collection a, Collection b);
(Incident Review ID: 227973)
======================================================================
- relates to
-
JDK-6394757 (coll) AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
- Open