Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-5028425

(coll) Collection.retainAll(Collection) implementations are not optimized

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • 1.4.2
    • core-libs



      Name: jl125535 Date: 04/07/2004


      FULL PRODUCT VERSION :
      java version "1.4.2"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
      Java HotSpot(TM) Server VM (build 1.4.2-b28, mixed mode)

      java version "1.5.0-beta2"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta2-b45)
      Java HotSpot(TM) Client VM (build 1.5.0-beta2-b45, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows 2000 [Version 5.00.2195]

      A DESCRIPTION OF THE PROBLEM :
      Collection.retainAll(Collection) implementations aren't optimized.
      When a sufficiently large collection is used as an argument, the
      performance is pathologically slow. (See attached program results)

      There are several factors that could be taken into account to optimize the
      performance of the retainAll(Collection) method. The default implementation
      provided by AbstractCollection could, modify its behavior depending on
      the size of the specified collection and the cost of its containment test.

      The attached code demonstrates a very simple way to avoid the current
      pathological worst case (by creating a HashSet). If the cost of constructing an unnecessary HashSet is to be avoided, an interface could be defined indicating that a Collection has an O(1) contains() method.

      The retainAll(Collection) method can be viewed as an operation on two collections.

      The concrete Collection implementations (HashSet, TreeSet, ArrayList, etc...)
      have at their disposal.
      - the sizes of both collections
      - the element insertion speed of the implementing collection
      - the element deletion speed of the implementing collection
      - the containment test speed of the implementing collection
      Additionally, the following information is likely to be known
      - the element insertion speed of the argument collection
      - the element deletion speed of the argument collection
      - the containment test speed of the argument collection

      So, not only could the implementation provided by AbstractCollection be
      greatly improved, but the implementations provided by the concrete
      implementations could be improved beyond that.

      Similar optimizations should be made for the containsAll(Collection) and
      removeAll(Collection) methods, too.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Invoke retainAll(collection) on a collection, using an argument with a relatively slow (>O(1)) containment check.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The execution speed of x.retainAll(y) should be no greater than O(x.size() + y.size()).
      ACTUAL -
      The execution spped of x.retainAll(y) can be as slow as O(x.size() * y.size()).
      Partial execution results follow:

      java.util.ArrayList took 32 ms for 512
      java.util.LinkedList took 0 ms for 512
      java.util.HashSet took 0 ms for 512
      java.util.TreeSet took 0 ms for 512
      Test$FastSet took 16 ms for 512
      java.util.ArrayList took 0 ms for 1024
      java.util.LinkedList took 0 ms for 1024
      java.util.HashSet took 0 ms for 1024
      java.util.TreeSet took 0 ms for 1024
      Test$FastSet took 0 ms for 1024
      java.util.ArrayList took 47 ms for 2048
      java.util.LinkedList took 15 ms for 2048
      java.util.HashSet took 16 ms for 2048
      java.util.TreeSet took 31 ms for 2048
      Test$FastSet took 16 ms for 2048
      java.util.ArrayList took 63 ms for 4096
      java.util.LinkedList took 31 ms for 4096
      java.util.HashSet took 47 ms for 4096
      java.util.TreeSet took 47 ms for 4096
      Test$FastSet took 31 ms for 4096
      java.util.ArrayList took 203 ms for 8192
      java.util.LinkedList took 156 ms for 8192
      java.util.HashSet took 156 ms for 8192
      java.util.TreeSet took 141 ms for 8192
      Test$FastSet took 0 ms for 8192
      java.util.ArrayList took 891 ms for 16384
      java.util.LinkedList took 703 ms for 16384
      java.util.HashSet took 719 ms for 16384
      java.util.TreeSet took 719 ms for 16384
      Test$FastSet took 16 ms for 16384
      java.util.ArrayList took 11515 ms for 32768
      java.util.LinkedList took 11047 ms for 32768
      java.util.HashSet took 10031 ms for 32768
      java.util.TreeSet took 9281 ms for 32768
      Test$FastSet took 62 ms for 32768
      java.util.ArrayList took 98015 ms for 65536
      java.util.LinkedList took 92234 ms for 65536
      java.util.HashSet took 91921 ms for 65536
      java.util.TreeSet took 94109 ms for 65536
      Test$FastSet took 78 ms for 65536
      java.util.ArrayList took 492825 ms for 131072
      java.util.LinkedList took 475794 ms for 131072
      java.util.HashSet took 476044 ms for 131072
      java.util.TreeSet took 473700 ms for 131072
      Test$FastSet took 281 ms for 131072
      ...

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.util.*;

      public class Test {

      private static class FastSet extends HashSet {

          public FastSet(Collection collection) {
              super(collection);
          }

          /**
           * A very easy way to avoid the worst case performance.
           * There is still <b>lots</b> of room for improvement here.
           */
          public boolean retainAll(Collection collection) {
              if (collection.size() > 1000)
      collection = new HashSet(collection);
      return super.retainAll(collection);
          }
      } // Fast Set

      private static void testRetainAll(Collection full, Collection half) {
          int size = full.size();
          long start = System.currentTimeMillis();
          full.retainAll(half);
          long end = System.currentTimeMillis();
          long time = end - start;
          String name = full.getClass().getName();
          System.out.println(name + " took " + time + " ms " + " for " + size);
      }

      private static void testRetainAll(int size) {

          // create a list of size integers
          List full = new ArrayList(size);
          for (int i=0; i<size; i++)
              full.add(new Integer(i));

          // create a random list half the size of the full set
          List half = new ArrayList(full);
          java.util.Collections.shuffle(half);
          while (half.size() > full.size() / 2 )
              half.remove(0);

          testRetainAll(new ArrayList(full),half);
          testRetainAll(new LinkedList(full),half);
          testRetainAll(new HashSet(full),half);
          testRetainAll(new TreeSet(full),half);
          testRetainAll(new FastSet(full),half);
      }

      public static void main(String[] args) {
          int x=512;
          for (int i=0; i<28; i++) {
              testRetainAll(x);
              x *= 2;
          }
      }

      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Workarounds:

      a) Extend HashSet, TreeSet, etc.. to avoid the problem, and use the replacements, instead.
      b) Make sure the collection arguments are transformed, when needed. In other words, use something like this:
        x.retainAll(new HashSet(y));
      c) Write an efficient library method like:
        public static Collection retainAll(Collection a, Collection b);
      (Incident Review ID: 227973)
      ======================================================================

            smarks Stuart Marks
            jleesunw Jon Lee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: