-
Enhancement
-
Resolution: Fixed
-
P5
-
6
-
b128
-
x86
-
windows_xp
-
Verified
A DESCRIPTION OF THE REQUEST :
Currently, the disjoint function heuristics can maximize the effort of the disjoint operation in some cases.
JUSTIFICATION :
This problems may lead to a very inneficient behavior in some cases. Such situations must be avoided.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
A better behavior whould be
1. Test if C2 is a set. If it is, ok, sets has a very fast contains method;
2. ONLY IF NOT:
2.1. Swap collections if c1 is a set or;
2.2. Swap collections if c1 < c2;
ACTUAL -
The disjoint method contains an heuristic function that tries to minimize the effort of the disjoint operation. It's currently implemented as follows:
if ((c1 instanceof Set) && !(c2 instanceof Set) ||
(c1.size() > c2.size())) {
//Swap both collections.
}
//For each element in c1, verifyies if the element is in c2 using contains
This function has a big problem. If c1 is a List and c2 is already a set, but c1 is much bigger than c2, the collections will be swapped.
Notice that this is the worst expected behavior, since the contains method of the bigger list, not the smaller set, will be called.
---------- BEGIN SOURCE ----------
/*
You may compare the following function with the one presented in the workaround. The time difference in my machine is 12125 millis for the current disjoint function and negligent time for the workaround.
Notice that if we impose a swapping (by proving the collections in the reverse order) we also get a very good time
*/
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class Main {
private static final int AMOUNT = 100000;
private static List<String> c1 = new ArrayList<String>();
private static Set<String> c2 = new HashSet<String>();
public static void main(String[] args) {
for (int i = 0; i < AMOUNT; i++) {
c1.add("0x" + Integer.toHexString(i));
c2.add("0x" + i);
}
for (int i = AMOUNT; i < AMOUNT * 1.5; i++) {
c1.add("0x" + Integer.toHexString(i));
}
System.out.println("Running disjoint");
long time = System.currentTimeMillis();
System.out.println(Collections.disjoint(c1, c2));
System.out.printf("Total time: %d millis", System.currentTimeMillis() - time);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
The following disjoint function implements the expected behavior:
public static boolean disjoint(Collection<?> c1, Collection<?> c2) {
if (!(c2 instanceof Set) && (c1 instanceof Set || c1.size() < c2.size())) {
Collection<?> tmp = c1;
c1 = c2;
c2 = tmp;
System.out.println("Swapping collections.");
}
for (Object e : c1) {
if (c2.contains(e)) {
return false;
}
}
return true;
}
Currently, the disjoint function heuristics can maximize the effort of the disjoint operation in some cases.
JUSTIFICATION :
This problems may lead to a very inneficient behavior in some cases. Such situations must be avoided.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
A better behavior whould be
1. Test if C2 is a set. If it is, ok, sets has a very fast contains method;
2. ONLY IF NOT:
2.1. Swap collections if c1 is a set or;
2.2. Swap collections if c1 < c2;
ACTUAL -
The disjoint method contains an heuristic function that tries to minimize the effort of the disjoint operation. It's currently implemented as follows:
if ((c1 instanceof Set) && !(c2 instanceof Set) ||
(c1.size() > c2.size())) {
//Swap both collections.
}
//For each element in c1, verifyies if the element is in c2 using contains
This function has a big problem. If c1 is a List and c2 is already a set, but c1 is much bigger than c2, the collections will be swapped.
Notice that this is the worst expected behavior, since the contains method of the bigger list, not the smaller set, will be called.
---------- BEGIN SOURCE ----------
/*
You may compare the following function with the one presented in the workaround. The time difference in my machine is 12125 millis for the current disjoint function and negligent time for the workaround.
Notice that if we impose a swapping (by proving the collections in the reverse order) we also get a very good time
*/
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class Main {
private static final int AMOUNT = 100000;
private static List<String> c1 = new ArrayList<String>();
private static Set<String> c2 = new HashSet<String>();
public static void main(String[] args) {
for (int i = 0; i < AMOUNT; i++) {
c1.add("0x" + Integer.toHexString(i));
c2.add("0x" + i);
}
for (int i = AMOUNT; i < AMOUNT * 1.5; i++) {
c1.add("0x" + Integer.toHexString(i));
}
System.out.println("Running disjoint");
long time = System.currentTimeMillis();
System.out.println(Collections.disjoint(c1, c2));
System.out.printf("Total time: %d millis", System.currentTimeMillis() - time);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
The following disjoint function implements the expected behavior:
public static boolean disjoint(Collection<?> c1, Collection<?> c2) {
if (!(c2 instanceof Set) && (c1 instanceof Set || c1.size() < c2.size())) {
Collection<?> tmp = c1;
c1 = c2;
c2 = tmp;
System.out.println("Swapping collections.");
}
for (Object e : c1) {
if (c2.contains(e)) {
return false;
}
}
return true;
}
- relates to
-
JDK-6394757 (coll) AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
- Open