-
Bug
-
Resolution: Unresolved
-
P4
-
7
As written about in
http://msmvps.com/blogs/jon_skeet/archive/2010/07/29/there-s-a-hole-in-my-abstraction-dear-liza-dear-liza.aspx
there can be performance anomolies in AbstractSet.removeAll under some conditions; quoting:
" This implementation [of AbstractSet.removeAll] determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.
Now that sounds reasonable on the surface of it - iterate through the smaller collection, check for the presence in the bigger collection. However, this is where the abstraction is leaky. Just because we can ask for the presence of an item in a large collection doesn't mean it's going to be fast. In our case, the collections are the same size - but checking for the presence of an item in the HashSet is O(1) whereas checking in the ArrayList is O(N)... whereas the cost of iterating is going to be the same for each collection. Basically by choosing to iterate over the HashSet and check for presence in the ArrayList, we've got an O(M * N) solution overall instead of an O(N) solution. Ouch. The removeAll method is making an "optimization" based on assumptions which just aren't valid in this case."
http://msmvps.com/blogs/jon_skeet/archive/2010/07/29/there-s-a-hole-in-my-abstraction-dear-liza-dear-liza.aspx
there can be performance anomolies in AbstractSet.removeAll under some conditions; quoting:
" This implementation [of AbstractSet.removeAll] determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.
Now that sounds reasonable on the surface of it - iterate through the smaller collection, check for the presence in the bigger collection. However, this is where the abstraction is leaky. Just because we can ask for the presence of an item in a large collection doesn't mean it's going to be fast. In our case, the collections are the same size - but checking for the presence of an item in the HashSet is O(1) whereas checking in the ArrayList is O(N)... whereas the cost of iterating is going to be the same for each collection. Basically by choosing to iterate over the HashSet and check for presence in the ArrayList, we've got an O(M * N) solution overall instead of an O(N) solution. Ouch. The removeAll method is making an "optimization" based on assumptions which just aren't valid in this case."
- relates to
-
JDK-6394757 (coll) AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
-
- Open
-