Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8277095

Empty streams create too many objects

XMLWordPrintable

      A DESCRIPTION OF THE PROBLEM :
      In the past, we used for-loops to navigate collections. In the case that the collection was empty, we would immediately exit from the loop. This was fast. For example, here is a rather complicated loop that takes almost no time (and creates no objects) when we use a for-loop:

          private static OptionalInt oldLoop(List<String> of) {
              Integer max = null;
              for (String s : of) {
                  if (Objects.nonNull(s) && s.length() > 0) {
                      int val = Integer.parseInt(s);
                      if (max == null) max = val;
                      else max = Integer.max(max, val);
                  }
              }
              return max == null ? OptionalInt.empty() :
                      OptionalInt.of(max);
          }

      Using a stream is far more readable:

          private static OptionalInt newLoop1(List<String> of) {
              return of.stream()
                      .filter(Objects::nonNull)
                      .filter(s -> s.length() > 0)
                      .mapToInt(Integer::parseInt)
                      .max();
          }

      However, the stream version has to create the entire stream pipeline which allocates 400 bytes. The pipeline is then thrown away when we discover that there are no elements in the stream.

      Our idea is to change Collection#stream() and the various Arrays.stream() methods to check whether the stream is empty and if so, to return an specialized instance of Stream / IntStream / LongStream / DoubleStream to minimize object allocation and improve performance. Our benchmarks have indicated that most of the object allocation in that case can be eliminated through escape analysis and that the performance of these empty streams is between 2x and 9x faster than before, depending on the complexity of the stream pipeline.

      We have created some proof of concept classes, together with extensive unit tests to verify that the behaviour of the empty streams mirrors that of the old streams. For example, the characteristics remain the same and the behavior in case of calls to unordered(), distinct(), etc. also are consistent with previous behaviors.

      There are two remaining unsolved issues. The first is that under the current system, it is possible to create a stream from a collection, then change the collection, and the stream will lazily see the elements. For example:

      jshell> var list = new ArrayList<Integer>()
      list ==> []

      jshell> var stream = list.stream()
      stream ==> java.util.stream.ReferencePipeline$Head@7791a895

      jshell> Collections.addAll(list, 1, 2, 3)
      $6 ==> true

      jshell> stream.forEach(System.out::print)
      123

      With our current EmptyStream implementation, the forEach would not see the elements.

      This lazy creation of the stream does not happen with concurrent and immutable streams.

      The second issue is that the current stream test suite in the OpenJDK makes assumptions about the way that the Streams are implemented. They have a lot of white-box tests that currently make down-casts to the AbstractPipeline. These tests will have to be adapted to also work for EmptyStream.

      Here is the current PR that shows the work-in-progress. Most of the code is test code (2500LOC), followed by the actual EmptyStream implementations (1500LOC) and then a small but extensive JMH microbenchmark (150LOC).

      https://github.com/openjdk/jdk/pull/6275

      https://github.com/openjdk/jdk/pull/6275


            Unassigned Unassigned
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: