Description
Summary
Unify the treatment of reference and primitive types in generic code by allowing Java type variables to range over both kinds of types. Produce new warnings to maintain the safety guarantees of generic code. This is a preview language feature.
Non-Goals
The core primitive class types feature is introduced by JEP 401 (Primitive Classes). This JEP is only concerned with supporting primitive class types as type arguments.
In the future (see Dependencies), we expect the JVM to optimize the performance of primitive type parameterizations, with help from the Java compiler. But for now, generics continue to be implemented via erasure.
Significant adjustments to generic standard library code are expected in response to new warnings, but those adjustments will be pursued in a separate JEP. Future work may also refactor implementations of hand-specialized primitive code.
Motivation
A common programming task is to take code that solves a problem for values of a particular type and extend that code to work on values of other types. Java developers can use three different strategies to perform this task:
Hand-specialized code: Rewrite the same code multiple times (perhaps with copy and paste), using different types each time.
Subtype polymorphism: Change the types in the solution to be a common supertype of all anticipated operand types.
Parametric polymorphism: Replace the types in the solution with type variables, instantiated by callers with whatever types they need to operate on.
The java.util.Arrays.binarySearch
methods are a good illustration of all three
strategies:
static int binarySearch(Object[] a, Object key)
static <T> int binarySearch(T[] a, T key, Comparator<? super T> c)
static int binarySearch(char[] a, char key)
static int binarySearch(byte[] a, byte key)
static int binarySearch(short[] a, short key)
static int binarySearch(int[] a, int key)
static int binarySearch(long[] a, long key)
static int binarySearch(float[] a, float key)
static int binarySearch(double[] a, double key)
The first variant uses subtype polymorphism. It works for all arrays of
reference types, which share the common supertype Object[]
. The search key
can, similarly, be any object. The behavior of the method depends on dynamic
properties of the arguments—at run time, do the array components and key support
comparison to each other?
The second variant uses parametric polymorphism. It also works for all arrays of reference types, but asks the caller to provide a comparison function. The parameterized method signature ensures at compile time, for each call site, that the array components and key are of types supported by the provided comparison function.
Additional variants use hand specialization. These work on arrays of basic primitive types, which do not have a useful common supertype. Unfortunately, this means there are seven different copies of a nearly-identical method, adding a lot of complexity to the API and violating the DRY principle.
Primitive class types, introduced in JEP 401, are a new kind of
type, allowing developers to operate directly on custom-defined primitives.
Primitive values have lightweight conversions to reference types, and thus can
participate in subtyping relationships. Arrays of primitive values also support
these conversions (e.g., an array of values can be treated as an Object[]
).
Thus, primitive value types work out-of-the-box with APIs that rely on subtype
polymorphism, such as the Object[ ]
variant of binarySearch
.
With JEP 402 (Classes for the Basic Primitives), we will update
the language to treat the basic primitive types (int
, double
, etc.) as
primitive class types, and primitive arrays such as int[ ]
as subtypes of
Object[]
. This will eliminate the need for overloads like those of
binarySearch
that were hand-specialized across basic primitive types. (Existing
APIs may have binary compatibility obligations, of course, and subtype
polymorphism may not be acceptable for performance-critical code.)
The third option, parametric polymorphism, was unfortunately designed for
reference types only. Under current rules, no primitive type can be a type
argument. Instead, for a primitive class type Point
, sorting an array of
Point
s with a comparison function requires choosing a reference type as the
instantiation of T
, and then providing a comparison function that works on all
values of that reference type.
Primitive classes do come with a sharp companion reference type—in this
case, Point.ref
—but using such a type as a type argument is a poor
solution. For one thing, everything that interacts with the type, such as the
comparison function, would also have to work with Point.ref
, producing a lot
of extra noise in the source code. For another, in the future, we would like to
optimize calls to the comparison function by passing inlined Point
values
directly. But the reference type Point.ref
cannot be inlined as efficiently.
A much better solution is for generic APIs to support primitive class types directly, in addition to reference types. Ideally, this should be the default behavior of Java's generics, so that primitive class types can participate fully in the Java ecosystem.
The language can achieve this by relaxing the requirement that type arguments must be reference types, and then adjusting the treatment of type variables, bounds, and inference accordingly.
A significant implication that developers will need to account for is that a
universal type variable might now represent a type that does not permit null
.
Java compilers can produce warnings, much like the unchecked warnings introduced
in Java 5, to alert developers to this possibility. Then developers can make use
of some new language features to address the warnings.
Description
The features described below are preview features, enabled with the
--enable-preview
compile-time and runtime flags.
Type variables and bounds
Previously, Java's type variable bounds were interpreted according to the language's subtyping relation. We now say that a type S is bounded by a type T if any of the following is true:
S is a subtype of T (where every type is a subtype of itself, and reference types are subtypes of many other types, per their class declarations and other subtyping rules); or
S is a primitive class type whose corresponding reference type is bounded by T; or
S is a type variable with an upper bound that is bounded by T, or T is a type variable with a lower bound and S is bounded by the lower bound of T.
As usual, type variables are declared with upper bounds, and those declared
without bounds (<T>
) implicitly have upper bound Object
(<T extends
Object>
). Any type may act as an upper bound.
At a use site, any type may be provided as a type argument instantiating a
type variable, as long as the type argument is bounded by the type variable's
upper bounds. For example, if Point
is a primitive class type, the type
List<Point>
is valid because Point
is bounded by Object
.
Type variables can thus range over any type, and are no longer assumed to represent a reference type.
Wildcards also have bounds, which again may be any type. Similar bounds checks
are performed when testing that one parameterized type is a subtype of another.
For example, if the primitive class Point
implements Shape
, the type
List<Point>
is a subtype of List<? extends Shape>
, and the type
List<Shape>
is a subtype of List<? super Point>
, because Point
is bounded
by Shape
.
Type argument inference is enhanced to support inferring primitive class types.
When an inference variable has a primitive type as its lower bound, that type
may become the inference result. For example, the invocation List.of(new
Point(3.0, -1.0))
typically has inferred type List<Point>
. If it occurs in an
assignment context, with target type Collection<Point.ref>
, it has inferred
type List<Point.ref>
. (To do: since these inference variables can range over
both primitive and reference types, "strict" applicability testing can no longer
be sure that an inferred parameterization won't end up requiring a conversion.
Unexpected errors or incorrect overload resolution results may occur.)
These changes to type variables, bounds checking, and inference are applied automatically to existing code. Many existing generic APIs will smoothly handle primitive class types without any modification.
Interaction with JEP 402: Wherever the phrase "any type" is used above, it describes the state of things after JEP 402 has been completed. Until that point, universal type variables will be not-quite-universal since they will be instantiable by reference types and primitive class types but not by the basic primitive types. While JEP 402 is not a prerequisite to this JEP (see Dependencies), it is expected to be completed in a similar timeframe.
With JEP 402, there is some source compatibility risk due to type inference
preferring int
over Integer
in existing code. This requires further
exploration.
Null pollution and null warnings
References can be null
, but primitive class types are not reference types, so
JEP 401 prohibits assigning null
to them.
Point p = null; // error
By allowing type variables to range over a wider set of types, we must ask
developers to make fewer assumptions about their instantiations. Specifically,
it is usually improper to assign null
to a variable with a type-variable type,
because the type variable may be instantiated by a primitive class type.
class C<T> { T x = null; /* shouldn't do this */ }
C<Point> c = new C<Point>();
Point p = c.x; // error
In this example, the type of the field x
is erased to Object
, so at run time
a C<Point>
will happily store a null
, even though this violates the
expectations of the compile-time type. This scenario is an example of null
pollution, a new kind of heap pollution. Like other forms of heap pollution,
the problem is detected at run time when the program attempts to assign a value
to a variable whose erased type does not support it—in this case, the
assignment to p
.
As for other forms of heap pollution, the compiler produces null warnings to discourage null pollution:
A warning is issued when a
null
literal is assigned to a universal type-variable type.A warning is issued when a non-
final
field with a universal type-variable type is left uninitialized by a constructor.
(There are also null warnings for certain conversions, discussed in a later section.)
class Box<T> {
T x;
public Box() {} // warning: uninitialized field
T get() {
return x;
}
void set(T newX) {
x = newX;
}
void clear() {
x = null; // warning: null assignment
}
T swap(T oldX, T newX) {
T currentX = x;
if (currentX != oldX)
return null; // warning: null assignment
x = newX;
return oldX;
}
}
As with unchecked warnings, null warnings alert programmers to the risk of heap pollution, which can lead to unexpected runtime exceptions in downstream assignments. Code that compiles without warnings will not throw these exceptions, and can safely be instantiated with primitive class types.
A significant amount of existing generic code produces null warnings, having been written with the assumption that type variables are reference types. We encourage developers, as they are able, to update their code to eliminate sources of null pollution.
In a future release (see Dependencies), the physical layout of generic code may be specialized for each primitive class type. At that point null pollution will be detected earlier, and code that has failed to address the warnings may become unusable. Code that has addressed the warnings is specialization-ready, meaning that future JVM enhancements will not disrupt its functionality.
Reference type-variable types
When generic code needs to work with null
, the language offers a few special
features to ensure that a type-variable type is a (null
-friendly) reference
type.
A type variable that is bounded by
IdentityObject
(either directly or via an identity class bound) is always a reference type.class C<T extends Reader> { T x = null; /* ok */ } FileReader r = new C<FileReader>().x;
A type variable whose declaration is modified by the contextual keyword
ref
prohibits primitive type arguments, and thus is always a reference type.class C<ref T> { T x = null; /* ok */ } FileReader r = new C<FileReader>().x; Point.ref p = new C<Point.ref>().x;
A type variable use may be modified by the syntax
.ref
, which represents a mapping from the instantiating type to its tightest bounding reference type (e.g.,Point
maps toPoint.ref
, whileFileReader
maps toFileReader
).class C<T> { T.ref x = null; /* ok */ } FileReader r = new C<FileReader>().x; Point.ref p = new C<Point.ref>().x; Point.ref p2 = new C<Point>().x;
(The new syntax above is subject to change.)
In the last case, the types T
and T.ref
are two distinct type-variable
types. Assignments between the two types are allowed, as a form of value object
conversion or primitive value conversion.
class C<T> {
T.ref x = null;
void set(T arg) { x = arg; /* ok */ }
}
A type variable that is bounded by IdentityObject
or declared with the ref
modifier is a reference type variable. All other type variables are called
universal type variables.
Similarly, a type that names a reference type variable or has the form T.ref
is called a reference type-variable type, while a type that names a universal
type variable without .ref
is called a universal type-variable type.
Warnings on value conversion
Primitive value conversions allow a value object to be converted to a primitive value of the same class. Per JEP 401, if the reference is null, the conversion fails at run time.
Point.ref pr = null;
Point p = pr; // NullPointerException
When primitive value conversion is applied to a type-variable type there is no runtime check, but the conversion may be a source of null pollution.
T.ref tr = null;
T t = tr; // t is polluted
To help prevent both NullPointerException
s and null pollution, primitive value
conversions produce null warnings unless the compiler can prove that the
reference being converted is non-null
.
class C<T> {
T.ref x = null;
T get() { return x; } // warning: possible null in conversion
T.ref getRef() { return x; }
}
C<Point> c = new C<>();
Point p1 = c.get();
Point p2 = c.getRef(); // warning: possible null in conversion
If a parameter, local variable, or final
field has a reference type-variable
type then the compiler may be able to prove, at certain usages, that the
variable's value is non-null
. In that case, primitive value conversion may
occur without a null warning. The details and limitations of this analysis
require further exploration, but might be similar to the control-flow analysis
that determines whether a variable has been initialized before use.
<T> T deref(T.ref val, T alternate) {
if (val == null) return alternate;
return val; // no warning
}
Similar to assignment, overrides that involve adding or removing .ref
to a
type-variable type are allowed, but prompt a null warning.
interface Sup<T> {
void put(T.ref arg);
}
interface Sub<T> extends Sup<T> {
void put(T arg); // null warning
}
Parameterized type conversions
Unchecked conversions traditionally allow a raw type to be converted to a parameterization of the same class. These conversions are unsound, and are thus accompanied by unchecked warnings.
As developers make changes such as applying .ref
to certain type variable uses,
they may end up with parameterized types (e.g., List<T.ref>
) in API signatures
that are out of sync with other code. To ease migration, the allowed set of
unchecked conversions is expanded to include the following
parameterized-to-parameterized conversions:
Changing a type argument of a parameterized type from a universal type-variable type (
T
) to its reference type (T.ref
), or vice versa:List<T.ref> newList() { return Arrays.asList(null, null); } List<T> list = newList(); // unchecked warning
Changing a type argument of a parameterized type from a primitive class type (
Point
) to its reference type (Point.ref
), or vice versa:void plot(Function<Point.ref, Color> f) { ... } Function<Point, Color> gradient = p -> Color.gray(p.x()); plot(gradient); // unchecked warning
Changing a wildcard bound in a parameterized type from a universal type-variable type (
T
) or a primitive class type (Point
) to its reference type (T.ref
,Point.ref
), or vice versa (where the conversion is not already allowed by subtyping):Supplier<? extends T.ref> nullFactory() { return () -> null; } Supplier<? extends T> factory = nullFactory(); // unchecked warning
Recursively applying an unchecked conversion to any type argument or wildcard bound of a parameterized type:
Set<Map.Entry<String, T>> allEntries() { ... } Set<Map.Entry<String, T.ref>> entries = allEntries(); // unchecked warning
These unchecked conversions may seem easily avoidable in small code snippets, but the flexibility they offer will significantly ease migration as different program components or libraries adopt universal generics at different times.
In addition to unchecked assignments, these conversions can be used by unchecked casts and method overrides:
interface Calendar<T> {
Set<T> get(Set<LocalDate> dates);
}
class CalendarImpl<T> implements Calendar<T> {
Set<T.ref> get(Set<LocalDate> dates) { ... } // unchecked warning
}
Compiling to class
files
Generic classes and methods will continue to be implemented via erasure, replacing type variables with their erased bounds in generated bytecode. Within generic APIs, primitive objects will therefore generally be operated on as references.
The usual rules for detecting heap pollution apply: Casts are inserted at certain program points to assert that a value has the expected runtime type. In the case of primitive class types, this includes checking that the value is non-null.
We extend the Signature
attribute to encode additional forms of compile-time
type information:
- Type variables declared as
ref T
, - Type variable uses of the form
T.ref
, and - Primitive class types appearing as type arguments and type variable/wildcard bounds.
Alternatives
We could ask developers to always use primitive reference types when making use of generic APIs. This is not a very good solution, as argued in the Motivation section.
We could also ask API authors to opt-in to universal type variables, rather than making type variables universal by default. But the goal is for universal generics to be the norm, and in practice there is no reason most type variables cannot be universal. An opt-in would introduce too much friction and lead to a fragmented Java ecosystem.
As noted, the erasure-based compilation strategy does not allow for the performance we might hope for from generic APIs operating on primitive values. In the future (see Dependencies) we expect to enhance the JVM to allow for compilation that produces heterogeneous classes specialized to different type arguments. With the language changes in this JEP developers can write more expressive code now and make their generic APIs specialization-ready, in anticipation of performance improvements in the future.
We could avoid introducing new warnings and accept null pollution as a routine
fact of programming with primitive class types. This would make for a cleaner
compilation experience, but the unpredictability of generic APIs at run time
would not be pleasant. Ultimately, we want developers who use null
in generic
APIs to notice and think carefully about how their usage interacts with
primitive types.
In the other extreme, we could treat some or all of the warnings as errors. But we do not want to introduce source and migration incompatibilities. Legacy code and uses of legacy APIs should still successfully compile, even if there are new warnings.
Risks and Assumptions
The success of these features depends on Java developers learning about and
adopting an updated model for the interaction of type variables with null
. The
new warnings will be highly visible, and they will need to be understood and
appreciated—not ignored—for them to have their desired effect.
Making these features available before specialized generics presents some
challenges. Some developers may be dissatisfied with the performance (e.g.,
comparing ArrayList<Point>
to Point[]
) and develop incorrect long-term
intuitions about the costs of using generics for primitive class types. Other
developers may make suboptimal choices when applying .ref
, not noticing any
ill effects until running on a specialization-supporting VM, long after the code
has been changed.
Dependencies
JEP 401 (Primitive Classes) is a prerequisite.
We expect JEP 402 (Classes for the Basic Primitives) to proceed concurrently with this JEP; together, they provide support for basic primitive types as type arguments and bounds.
A followup JEP will update the standard libraries, addressing null warnings and making the libraries specialization-ready.
Another followup JEP will introduce runtime specialization of generic APIs in the JVM.