Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8261529

Universal Generics (Preview)

XMLWordPrintable

    • Icon: JEP JEP
    • Resolution: Withdrawn
    • Icon: P4 P4
    • None
    • tools
    • None
    • Feature
    • Open
    • SE
    • valhalla dash dev at openjdk dot java dot net
    • XL
    • XL

      Summary

      Unify the treatment of reference and primitive types in generic code by allowing Java type variables to range over both kinds of types. Produce new warnings to maintain the safety guarantees of generic code. This is a preview language feature.

      Non-Goals

      The core primitive class types feature is introduced by JEP 401 (Primitive Classes). This JEP is only concerned with supporting primitive class types as type arguments.

      In the future (see Dependencies), we expect the JVM to optimize the performance of primitive type parameterizations, with help from the Java compiler. But for now, generics continue to be implemented via erasure.

      Significant adjustments to generic standard library code are expected in response to new warnings, but those adjustments will be pursued in a separate JEP. Future work may also refactor implementations of hand-specialized primitive code.

      Motivation

      A common programming task is to take code that solves a problem for values of a particular type and extend that code to work on values of other types. Java developers can use three different strategies to perform this task:

      • Hand-specialized code: Rewrite the same code multiple times (perhaps with copy and paste), using different types each time.

      • Subtype polymorphism: Change the types in the solution to be a common supertype of all anticipated operand types.

      • Parametric polymorphism: Replace the types in the solution with type variables, instantiated by callers with whatever types they need to operate on.

      The java.util.Arrays.binarySearch methods are a good illustration of all three strategies:

      static int binarySearch(Object[] a, Object key)
      static <T> int binarySearch(T[] a, T key, Comparator<? super T> c)
      static int binarySearch(char[] a, char key)
      static int binarySearch(byte[] a, byte key)
      static int binarySearch(short[] a, short key)
      static int binarySearch(int[] a, int key)
      static int binarySearch(long[] a, long key)
      static int binarySearch(float[] a, float key)
      static int binarySearch(double[] a, double key)

      The first variant uses subtype polymorphism. It works for all arrays of reference types, which share the common supertype Object[]. The search key can, similarly, be any object. The behavior of the method depends on dynamic properties of the arguments—at run time, do the array components and key support comparison to each other?

      The second variant uses parametric polymorphism. It also works for all arrays of reference types, but asks the caller to provide a comparison function. The parameterized method signature ensures at compile time, for each call site, that the array components and key are of types supported by the provided comparison function.

      Additional variants use hand specialization. These work on arrays of basic primitive types, which do not have a useful common supertype. Unfortunately, this means there are seven different copies of a nearly-identical method, adding a lot of complexity to the API and violating the DRY principle.

      Primitive class types, introduced in JEP 401, are a new kind of type, allowing developers to operate directly on custom-defined primitives. Primitive values have lightweight conversions to reference types, and thus can participate in subtyping relationships. Arrays of primitive values also support these conversions (e.g., an array of values can be treated as an Object[]). Thus, primitive value types work out-of-the-box with APIs that rely on subtype polymorphism, such as the Object[ ] variant of binarySearch.

      With JEP 402 (Classes for the Basic Primitives), we will update the language to treat the basic primitive types (int, double, etc.) as primitive class types, and primitive arrays such as int[ ] as subtypes of Object[]. This will eliminate the need for overloads like those of binarySearch that were hand-specialized across basic primitive types. (Existing APIs may have binary compatibility obligations, of course, and subtype polymorphism may not be acceptable for performance-critical code.)

      The third option, parametric polymorphism, was unfortunately designed for reference types only. Under current rules, no primitive type can be a type argument. Instead, for a primitive class type Point, sorting an array of Points with a comparison function requires choosing a reference type as the instantiation of T, and then providing a comparison function that works on all values of that reference type.

      Primitive classes do come with a sharp companion reference type—in this case, Point.ref—but using such a type as a type argument is a poor solution. For one thing, everything that interacts with the type, such as the comparison function, would also have to work with Point.ref, producing a lot of extra noise in the source code. For another, in the future, we would like to optimize calls to the comparison function by passing inlined Point values directly. But the reference type Point.ref cannot be inlined as efficiently.

      A much better solution is for generic APIs to support primitive class types directly, in addition to reference types. Ideally, this should be the default behavior of Java's generics, so that primitive class types can participate fully in the Java ecosystem.

      The language can achieve this by relaxing the requirement that type arguments must be reference types, and then adjusting the treatment of type variables, bounds, and inference accordingly.

      A significant implication that developers will need to account for is that a universal type variable might now represent a type that does not permit null. Java compilers can produce warnings, much like the unchecked warnings introduced in Java 5, to alert developers to this possibility. Then developers can make use of some new language features to address the warnings.

      Description

      The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags.

      Type variables and bounds

      Previously, Java's type variable bounds were interpreted according to the language's subtyping relation. We now say that a type S is bounded by a type T if any of the following is true:

      • S is a subtype of T (where every type is a subtype of itself, and reference types are subtypes of many other types, per their class declarations and other subtyping rules); or

      • S is a primitive class type whose corresponding reference type is bounded by T; or

      • S is a type variable with an upper bound that is bounded by T, or T is a type variable with a lower bound and S is bounded by the lower bound of T.

      As usual, type variables are declared with upper bounds, and those declared without bounds (<T>) implicitly have upper bound Object (<T extends Object>). Any type may act as an upper bound.

      At a use site, any type may be provided as a type argument instantiating a type variable, as long as the type argument is bounded by the type variable's upper bounds. For example, if Point is a primitive class type, the type List<Point> is valid because Point is bounded by Object.

      Type variables can thus range over any type, and are no longer assumed to represent a reference type.

      Wildcards also have bounds, which again may be any type. Similar bounds checks are performed when testing that one parameterized type is a subtype of another. For example, if the primitive class Point implements Shape, the type List<Point> is a subtype of List<? extends Shape>, and the type List<Shape> is a subtype of List<? super Point>, because Point is bounded by Shape.

      Type argument inference is enhanced to support inferring primitive class types. When an inference variable has a primitive type as its lower bound, that type may become the inference result. For example, the invocation List.of(new Point(3.0, -1.0)) typically has inferred type List<Point>. If it occurs in an assignment context, with target type Collection<Point.ref>, it has inferred type List<Point.ref>. (To do: since these inference variables can range over both primitive and reference types, "strict" applicability testing can no longer be sure that an inferred parameterization won't end up requiring a conversion. Unexpected errors or incorrect overload resolution results may occur.)

      These changes to type variables, bounds checking, and inference are applied automatically to existing code. Many existing generic APIs will smoothly handle primitive class types without any modification.

      Interaction with JEP 402: Wherever the phrase "any type" is used above, it describes the state of things after JEP 402 has been completed. Until that point, universal type variables will be not-quite-universal since they will be instantiable by reference types and primitive class types but not by the basic primitive types. While JEP 402 is not a prerequisite to this JEP (see Dependencies), it is expected to be completed in a similar timeframe.

      With JEP 402, there is some source compatibility risk due to type inference preferring int over Integer in existing code. This requires further exploration.

      Null pollution and null warnings

      References can be null, but primitive class types are not reference types, so JEP 401 prohibits assigning null to them.

      Point p = null; // error

      By allowing type variables to range over a wider set of types, we must ask developers to make fewer assumptions about their instantiations. Specifically, it is usually improper to assign null to a variable with a type-variable type, because the type variable may be instantiated by a primitive class type.

      class C<T> { T x = null; /* shouldn't do this */ }
      C<Point> c = new C<Point>();
      Point p = c.x; // error

      In this example, the type of the field x is erased to Object, so at run time a C<Point> will happily store a null, even though this violates the expectations of the compile-time type. This scenario is an example of null pollution, a new kind of heap pollution. Like other forms of heap pollution, the problem is detected at run time when the program attempts to assign a value to a variable whose erased type does not support it—in this case, the assignment to p.

      As for other forms of heap pollution, the compiler produces null warnings to discourage null pollution:

      • A warning is issued when a null literal is assigned to a universal type-variable type.

      • A warning is issued when a non-final field with a universal type-variable type is left uninitialized by a constructor.

      (There are also null warnings for certain conversions, discussed in a later section.)

      class Box<T> {
      
          T x;
      
          public Box() {} // warning: uninitialized field
      
          T get() {
              return x;
          }
      
          void set(T newX) {
              x = newX;
          }
      
          void clear() {
              x = null; // warning: null assignment
          }
      
          T swap(T oldX, T newX) {
              T currentX = x;
              if (currentX != oldX)
                  return null; // warning: null assignment
              x = newX;
              return oldX;
          }
      
      }

      As with unchecked warnings, null warnings alert programmers to the risk of heap pollution, which can lead to unexpected runtime exceptions in downstream assignments. Code that compiles without warnings will not throw these exceptions, and can safely be instantiated with primitive class types.

      A significant amount of existing generic code produces null warnings, having been written with the assumption that type variables are reference types. We encourage developers, as they are able, to update their code to eliminate sources of null pollution.

      In a future release (see Dependencies), the physical layout of generic code may be specialized for each primitive class type. At that point null pollution will be detected earlier, and code that has failed to address the warnings may become unusable. Code that has addressed the warnings is specialization-ready, meaning that future JVM enhancements will not disrupt its functionality.

      Reference type-variable types

      When generic code needs to work with null, the language offers a few special features to ensure that a type-variable type is a (null-friendly) reference type.

      • A type variable that is bounded by IdentityObject (either directly or via an identity class bound) is always a reference type.

        class C<T extends Reader> { T x = null; /* ok */ }
        
        FileReader r = new C<FileReader>().x;
      • A type variable whose declaration is modified by the contextual keyword ref prohibits primitive type arguments, and thus is always a reference type.

        class C<ref T> { T x = null; /* ok */ }
        
        FileReader r = new C<FileReader>().x;
        Point.ref p = new C<Point.ref>().x;
      • A type variable use may be modified by the syntax .ref, which represents a mapping from the instantiating type to its tightest bounding reference type (e.g., Point maps to Point.ref, while FileReader maps to FileReader).

        class C<T> { T.ref x = null; /* ok */ }
        
        FileReader r = new C<FileReader>().x;
        Point.ref p = new C<Point.ref>().x;
        Point.ref p2 = new C<Point>().x;

      (The new syntax above is subject to change.)

      In the last case, the types T and T.ref are two distinct type-variable types. Assignments between the two types are allowed, as a form of value object conversion or primitive value conversion.

      class C<T> {
          T.ref x = null;
          void set(T arg) { x = arg; /* ok */ }
      }

      A type variable that is bounded by IdentityObject or declared with the ref modifier is a reference type variable. All other type variables are called universal type variables.

      Similarly, a type that names a reference type variable or has the form T.ref is called a reference type-variable type, while a type that names a universal type variable without .ref is called a universal type-variable type.

      Warnings on value conversion

      Primitive value conversions allow a value object to be converted to a primitive value of the same class. Per JEP 401, if the reference is null, the conversion fails at run time.

      Point.ref pr = null;
      Point p = pr; // NullPointerException

      When primitive value conversion is applied to a type-variable type there is no runtime check, but the conversion may be a source of null pollution.

      T.ref tr = null;
      T t = tr; // t is polluted

      To help prevent both NullPointerExceptions and null pollution, primitive value conversions produce null warnings unless the compiler can prove that the reference being converted is non-null.

      class C<T> {
          T.ref x = null;
          T get() { return x; } // warning: possible null in conversion
          T.ref getRef() { return x; }
      }
      
      C<Point> c = new C<>();
      Point p1 = c.get();
      Point p2 = c.getRef(); // warning: possible null in conversion

      If a parameter, local variable, or final field has a reference type-variable type then the compiler may be able to prove, at certain usages, that the variable's value is non-null. In that case, primitive value conversion may occur without a null warning. The details and limitations of this analysis require further exploration, but might be similar to the control-flow analysis that determines whether a variable has been initialized before use.

      <T> T deref(T.ref val, T alternate) {
          if (val == null) return alternate;
          return val; // no warning
      }

      Similar to assignment, overrides that involve adding or removing .ref to a type-variable type are allowed, but prompt a null warning.

      interface Sup<T> {
          void put(T.ref arg);
      }
      
      interface Sub<T> extends Sup<T> {
          void put(T arg); // null warning
      }

      Parameterized type conversions

      Unchecked conversions traditionally allow a raw type to be converted to a parameterization of the same class. These conversions are unsound, and are thus accompanied by unchecked warnings.

      As developers make changes such as applying .ref to certain type variable uses, they may end up with parameterized types (e.g., List<T.ref>) in API signatures that are out of sync with other code. To ease migration, the allowed set of unchecked conversions is expanded to include the following parameterized-to-parameterized conversions:

      • Changing a type argument of a parameterized type from a universal type-variable type (T) to its reference type (T.ref), or vice versa:

        List<T.ref> newList() { return Arrays.asList(null, null); }
        List<T> list = newList(); // unchecked warning
      • Changing a type argument of a parameterized type from a primitive class type (Point) to its reference type (Point.ref), or vice versa:

        void plot(Function<Point.ref, Color> f) { ... }
        Function<Point, Color> gradient = p -> Color.gray(p.x());
        plot(gradient); // unchecked warning
      • Changing a wildcard bound in a parameterized type from a universal type-variable type (T) or a primitive class type (Point) to its reference type (T.ref, Point.ref), or vice versa (where the conversion is not already allowed by subtyping):

        Supplier<? extends T.ref> nullFactory() { return () -> null; }
        Supplier<? extends T> factory = nullFactory(); // unchecked warning
      • Recursively applying an unchecked conversion to any type argument or wildcard bound of a parameterized type:

        Set<Map.Entry<String, T>> allEntries() { ... }
        Set<Map.Entry<String, T.ref>> entries = allEntries(); // unchecked warning

      These unchecked conversions may seem easily avoidable in small code snippets, but the flexibility they offer will significantly ease migration as different program components or libraries adopt universal generics at different times.

      In addition to unchecked assignments, these conversions can be used by unchecked casts and method overrides:

      interface Calendar<T> {
          Set<T> get(Set<LocalDate> dates);
      }
      
      class CalendarImpl<T> implements Calendar<T> {
          Set<T.ref> get(Set<LocalDate> dates) { ... } // unchecked warning
      }

      Compiling to class files

      Generic classes and methods will continue to be implemented via erasure, replacing type variables with their erased bounds in generated bytecode. Within generic APIs, primitive objects will therefore generally be operated on as references.

      The usual rules for detecting heap pollution apply: Casts are inserted at certain program points to assert that a value has the expected runtime type. In the case of primitive class types, this includes checking that the value is non-null.

      We extend the Signature attribute to encode additional forms of compile-time type information:

      • Type variables declared as ref T,
      • Type variable uses of the form T.ref, and
      • Primitive class types appearing as type arguments and type variable/wildcard bounds.

      Alternatives

      We could ask developers to always use primitive reference types when making use of generic APIs. This is not a very good solution, as argued in the Motivation section.

      We could also ask API authors to opt-in to universal type variables, rather than making type variables universal by default. But the goal is for universal generics to be the norm, and in practice there is no reason most type variables cannot be universal. An opt-in would introduce too much friction and lead to a fragmented Java ecosystem.

      As noted, the erasure-based compilation strategy does not allow for the performance we might hope for from generic APIs operating on primitive values. In the future (see Dependencies) we expect to enhance the JVM to allow for compilation that produces heterogeneous classes specialized to different type arguments. With the language changes in this JEP developers can write more expressive code now and make their generic APIs specialization-ready, in anticipation of performance improvements in the future.

      We could avoid introducing new warnings and accept null pollution as a routine fact of programming with primitive class types. This would make for a cleaner compilation experience, but the unpredictability of generic APIs at run time would not be pleasant. Ultimately, we want developers who use null in generic APIs to notice and think carefully about how their usage interacts with primitive types.

      In the other extreme, we could treat some or all of the warnings as errors. But we do not want to introduce source and migration incompatibilities. Legacy code and uses of legacy APIs should still successfully compile, even if there are new warnings.

      Risks and Assumptions

      The success of these features depends on Java developers learning about and adopting an updated model for the interaction of type variables with null. The new warnings will be highly visible, and they will need to be understood and appreciated—not ignored—for them to have their desired effect.

      Making these features available before specialized generics presents some challenges. Some developers may be dissatisfied with the performance (e.g., comparing ArrayList<Point> to Point[]) and develop incorrect long-term intuitions about the costs of using generics for primitive class types. Other developers may make suboptimal choices when applying .ref, not noticing any ill effects until running on a specialization-supporting VM, long after the code has been changed.

      Dependencies

      JEP 401 (Primitive Classes) is a prerequisite.

      We expect JEP 402 (Classes for the Basic Primitives) to proceed concurrently with this JEP; together, they provide support for basic primitive types as type arguments and bounds.

      A followup JEP will update the standard libraries, addressing null warnings and making the libraries specialization-ready.

      Another followup JEP will introduce runtime specialization of generic APIs in the JVM.

            dlsmith Dan Smith
            dlsmith Dan Smith
            Dan Smith Dan Smith
            Brian Goetz
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: