Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8143577

optimize ArrayList.removeIf

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P5 P5
    • 9
    • None
    • core-libs
    • None

      A straightforward iterator loop removing selected elements from an ArrayList is O(n^2). (More precisely it's O(m*n) where m is the number of deletions and n is the number of elements.) This is because each removal causes the right-hand tail of the array to be copied one space to the left, and elements farther to the right in the array are copied multiple times.

      The current implementation of ArrayList.removeIf avoids this behavior by using a two-pass algorithm: it calls the predicate on each element, recording the result in a BitSet. It then scans the BitSet and copies elements once directly into the correct location. This is O(2n) time -- really O(n). However, it's O(n) space for the BitSet, admittedly more like O(n/32) because of the space efficiency of BitSet.

      An alternative algorithm could be implemented that is O(n) and avoids allocating any space. It involves walking two pointers (really, indexes) into the array from left to right. The leading pointer keeps track of calling the predicate on each element. The trailing pointer keeps track of the destination location to which an element is be copied if it is not to be removed. Elements to be removed are simply skipped. When the leading pointer has passed the last element, the array slots between the leading and trailing pointers are nulled out.

      This should be faster and should allocate less space than the BitSet approach. However, the BitSet approach has the characteristic that if the predicate throws an exception, the list remains unchanged. With the two-pointer approach, if the predicate throws an exception, the list becomes corrupted.

      Bulk operations don't generally have all-or-nothing semantics. For example, consider list.forEach(consumer) where the consumer has a side effect of modifying each element. If the consumer were to throw an exception, some elements would be modified and some would not be. It is essentially impossible to recover from this. So, while the all-or-nothing characteristic of the current ArrayList.removeIf implementation seems attractive, it's basically unnecessary.

            martin Martin Buchholz
            smarks Stuart Marks
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: