Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8153332

Fuse sorted().limit(n) into single operation

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • None
    • core-libs
    • None

      It's quite common demand to find n greatest or n least elements from the input according to some comparator or natural order. Using Stream API it's quite natural to solve this problem in the following way:

      stream.sorted().limit(n)...
      stream.sorted(Comparator.reverseOrder()).limit(n)...
      stream.sorted(comparator).limit(n)...
      stream.sorted(comparator.reversed()).limit(n)...

      Usually requested number n is quite small while the input could be very big. Currently Stream API will buffer the whole input into intermediate array, sort it and pass the output to the limit() operation which will select n first elements. It's possible for small n to fuse both operations together selecting n least elements in more efficient way.

      I propose the following algorithm:

      == First stage ==
      - Buffer input elements until 2*n elements are collected
      - Sort them normally and keep only n least elements
      - Switch to the second stage

      == Second stage ==
      - When new elements arrive, keep only those which are less than the n-th element found so far, collecting up to n additional (unsorted) elements
      - When n additional elements are collected, sort them normally and merge with the already sorted part, keeping only n least element
      - Continue doing the second stage

      Such implementation will need much less additional memory and could be in order of magnitude faster compared to the existing implementation (it's especially good for random input). A regression is noted on the descending input as every element is compared twice. Such regression seems tolerable as we still reduce memory consumption and the performance is still much better than for random input sorting.

      This enhancement covers object Stream only. Similar optimization is possible for primitive streams, but it's less useful, because custom comparator cannot be used for primitive stream, so currently it's not possible to select n greatest values using sorted().limit(n) without boxing.

            tvaleev Tagir Valeev
            tvaleev Tagir Valeev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: