Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8223933

Stream.distinct() sometimes allows duplicates when the stream is sorted

XMLWordPrintable

      A DESCRIPTION OF THE PROBLEM :
      java.util.Stream.distinct documentation states in documentation:
      > Returns a stream consisting of the distinct elements (according to Object.equals(Object)) of this stream.

      Internally used java.util.DistinctOps also states in documentation:
      > Factory methods for transforming streams into duplicate-free streams, using Object.equals(Object) to determine equality.

      These documentations mean that there should never be two elements sent downstream with java.util.Stream.distinct for which Object.equals(Object) == true

      However, this is not always guaranteed.

      The problem start with sorted streams (StreamOpFlag.SORTED). If the stream is sorted, java.util.DistinctOps.makeRef.opWrapSink checks the equality of the element only against the lastSeen element in an optimisation effort. However, this algorithm is not always correct since there is no strict contract between java.lang.Comparable.compareTo(T) and java.lang.Object.equals(Object).

      java.lang.Comparable.compareTo(T) states in the documentation:
      > It is strongly recommended, but not strictly required that (x.compareTo(y)==0) == (x.equals(y)).

      So, a sorted stream may have elements that are not consecutive, which results in java.util.Stream.distinct passing duplicate elements downstream.

      Conclusion:
      Either the java.util.Stream.distinct should clearly state uniqueness of elements sent downstream depends on java.lang.Comparable.compareTo(T) implementation, or the sorted streams should also be treated as non sorted streams (adding to a Set) by DistinctOps.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Create a class where equals(Object) == false and compareTo(T) == 0 is possible.

      Stream the objects with the only requirement that the objects for which "equals(Object) == false and compareTo(T) == 0" condition holds are not consecutive. Then call sorted() and distinct() in order. And then collect to a List.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The List should not contain duplicate elements according to Object.equals(Object).
      ACTUAL -
      The List does contain duplicate elements according to Object.equals(Object).

      ---------- BEGIN SOURCE ----------
      import java.util.HashSet;
      import java.util.List;
      import java.util.Objects;
      import java.util.Set;
      import java.util.stream.Collectors;
      import java.util.stream.Stream;

      public class Element implements Comparable<Element> {

          private final int x;
          private final int y;

          Element(int x, int y) {
              this.x = x;
              this.y = y;
          }

          @Override
          public boolean equals(Object o) {
              if (this == o) {
                  return true;
              }
              if (o == null || getClass() != o.getClass()) {
                  return false;
              }
              Element element = (Element) o;
              return x == element.x && y == element.y;
          }

          @Override
          public int hashCode() {
              return Objects.hash(x, y);
          }

          @Override
          public int compareTo(Element o) {
              return Integer.compare(x, o.x); // equals == false and compareTo == 0 is possible
          }

          @Override
          public String toString() {
              return "Element{" + "x=" + x + ", y=" + y + '}';
          }

          public static void main(String[] args) {
              Element element1 = new Element(1, 1);
              Element element2 = new Element(1, 2);
              Element element3 = new Element(2, 1); // same as element5
              Element element4 = new Element(2, 2);
              Element element5 = new Element(2, 1); // same as element3

              List<Element> sortedDistinctElementList = Stream.of(element1, element2, element3, element4, element5).sorted().distinct().collect(Collectors.toList());


              System.out.println("sortedDistinctElementList");
              System.out.println(sortedDistinctElementList.size());
              System.out.println(sortedDistinctElementList);

              Set<Element> distinctElementSet = new HashSet<>(sortedDistinctElementList);

              System.out.println("distinctElementSet");
              System.out.println(distinctElementSet.size());
              System.out.println(distinctElementSet);
          }
      }
      ---------- END SOURCE ----------

      FREQUENCY : always


            smarks Stuart Marks
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: