Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8346225

Add APIs for random sampling

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • None
    • core-libs
    • None

      Add APIs for selecting a specific number of random samples, without replacement, from an input consisting of a known or unknown number of elements.

      If the input size is known, one can use Algorithm S from Knuth, TAOCP Vol 2, sec 3.4.2, Random Sampling and Shuffling. See also https://stackoverflow.com/a/28655112/1441122 . This could be implemented with a method in the Collections class that takes a Collection, the sample count, and optionally a source of randomness.

      If the input size is not known, one can use a "reservoir sampling" technique. See https://en.wikipedia.org/wiki/Reservoir_sampling . A reservoir sampling method could be implemented as a Gatherer for a Stream. It would also take a sample count and optionally a source of randomness.

      It might be possible for the Gatherer-based API to be adaptive, depending upon whether the stream is SIZED. However, it doesn't appear that the Gatherer has access to the stream's size even if the stream is SIZED.

      An edge case is when the requested sample size is smaller than the actual population. Options are to return a short Stream or List, or to throw an exception.

            smarks Stuart Marks
            smarks Stuart Marks
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: