Invisible conversions everywhere dense in your classfile...
In the Java language, the Valhalla project unifies the type system so that all values (except possibly
null) are assigned subclasses of
double and other legacy primitive types (also known as the non-reference types in Java before Valhalla). This unification is done by more closely identifying, as "the same thing", both the boxed and the native forms of any given primitive value. This allows the legacy primitives to work in generic classes and methods (with suitably permissive type variable bounds).
But the JVM knows different: The native form of a
double occupies 64 bits and two stack slots, under the descriptor
"D", while the boxed form stores the 64-bit value in one or more buffer objects in the heap, and its reference occupies just one stack slot, under the descriptor
"Ljava/lang/Double;". There is no way, within the JVM, to retroactively unify those two representations of "the same" double value object.
Instead, the Java static compiler (e.g.,
javac) needs to juggle at least two representations for
double values, the two-slot native and the one-slot boxed reference. It must supply the correct format to each operation: If native arithmetic (like the
dadd bytecode) it must be in the native
"D" format, and if communicating with generic code or storing into an array of
Number, it must be in the boxed reference format.
The Java static compiler has historically done such juggling historically in order to implement language rules for implicit auto-boxing and auto-unboxing. To do this, it makes implicit calls (not seen in the source code) to well-defined API points such as
Double::valueOf (for boxing) and
Double::doubleValue (for unboxing).
Note also that the static compiler often implicitly changes the reference type of values around the boundaries of (erased) Java generic APIs, for example quietly casting the result of a call to
Object (the erased type variable bound, returned from the
get method) to
String (the type known at compile time). To do that task, it emits implicit uses of the
In the case of a generic API point like
Object value returned from the method must be implicitly retyped as
doublein two operational steps: Cast to
Double as a reference, and then unboxed to a native two-slot double with
For Valhalla user-defined primitives, the representation for both boxed and unboxed values are the same, but there are still going to be implicit changes in their types, and those changes will (probably, in the unboxing direction only) be reflected by operational
Objects::requireNonNull or the equivalent.
With Valhalla, we expect that the frequency (or density) of such conversions and checks may increase in some codes, as users enjoy freedom from worry about whether their values are boxed or not. But the JVM will have to worry all the more, especially for legacy primitives. Also, the static compiler will have to send the right guidance to it, in the form of implicit bytecode instructions to manage the implicit boxing and unboxing.
Valhalla does not plan to enhance the verifier type system beyond where it is today. In particular, we do not plan to propagate the results of
null checks in the verifier type system. This means that if javac forgets to put in a call to
nulls will be checked later if at all, when a variable is reached that is positively
null-rejecting. (The argument to
null-rejecting in that way, since a
null receiver elicits a
In summary, this means that the following operations will effectively be used as virtual machine instructions for managing low-level type changes in code generated by
Double::doubleValue-- for unboxing
Double::valueOf-- for boxing
<Primtype>::<primtype>Value-- likewise for unboxing any legacy primitive type.
<Primtype>::valueOf-- likewise for boxing any legacy primitive type.
Objects::requireNonNull-- for unboxing any user-defined primitive type
- (no code) -- for boxing any user-defined primitive type (verifier sees no type change)
In addition, calls to
requireNonNull will, in many cases, need to be followed by a
checkcast to reassure the verifier that what came out of the method is the same type (in fact, the same reference) as what went in. (This effect is not visible at the language level, since
requireNonNull generically returns its input type. But the JVM requires a
Also additionally, calls to
doubleValue (or any
<primtype>Value) will, in many cases, need to be preceded by a
checkcast to the box type. (These cases are either explicit user casts from a supertype like
Object, or implicit casts inserted around a generic API point.)
This prospect of a much greater volume of implicit conversion bytecodes, or pairs of such conversions, suggests that perhaps the translation strategy for Valhalla would benefit from new support in the JVM bytecode instruction set for expressing those conversions more simply.
(That is a big "perhaps"; it is expensive to add new bytecodes. This memo explores that expensive option. The fallback position, and plan of record at the moment, is to use as many library routine calls as it takes to get the job done, and call it a day.)
checkcast instruction in three directions:
- polymorphically produce legacy primitives as well as references (cf.
getstaticfor a precedent)
- polymorphically consume legacy primitives as well as references (cf.
putstaticfor a precedent)
- optionally perform null check operations
All three enhancements are enabled by a condition which previously has been illegal. That condition holds when the
checkcast instruction operand field, an index into the constant pool, refers to a
CONSTANT_Utf8 item, rather than a
CONSTANT_Class item, as is already legal.
The spelling of the
CONSTANT_Utf8 item selects the function:
Objectreference, casts to
Doubleand then calls
Objectreference, casts to
Integerand then calls
double(two slots) and calls
- (and so on for other
"!"peeks at the value on the stack and throws
NPEif it is null
Any other operand (any other spelling or other constant pool entry type) will fail verification of the
checkcast instruction, and is thereby reserved for future use.
The descriptions above are carefully crafted to imply the following interactions with the verifier type system:
">x"requires a reference (
Object) on the stack and leaves a primitive (
x) on the stack
"<x"requires a primitive (
x) on the stack and leaves its box type (not merely
Object) on the stack
"!"requires a reference on the stack and leaves that reference alone, with the same verifier type
It is obvious that an efficient interpreter would probably choose to require these new UTF8-using forms of
checkcast to an internal, otherwise unused bytecode, and use the operand field of that bytecode to efficiently select the required behavior corresponding to the UTF8 string.
It seems likely that the
javac compiler should choose to emit these new instances of
checkcast in some or all of the cases where it previously has emitted the method calls (whether implicit or explicit in the source code).
For presentations of this bytecode by other low-level tools, it is suggested that the name
checkbox be used instead of
checkcast, and the instruction be presented with its string operand unchanged. But the code point for
checkcast (decimal 192) should be reused (overloaded) for this new purpose, rather than allocating a new codepoint.