-
Enhancement
-
Resolution: Withdrawn
-
P3
-
repo-valhalla
## Background
Easy migration is a major goal of Valhalla, which is why we are creating value objects instead of non-object structs or the like. The linkage of value classes is (in most cases) indistinguishable from the linkage of legacy identity classes: Both use the some kinds of class names, and the same kinds of descriptors which we call "L-types" in the JVM, because they are spelled (in the JVM, not the language) like `Lpkg/Foo;`. To review: L-types are completely abstract and decoupled from the shape or nature of the named class, they do not trigger class loading, and they accommodate nulls.
To give the JVM a clear signal of where value objects might benefit from special processing, we also support a new descriptor for Q-types, like `Qpkg/Foo;`. Agian, to review: Q-types are closely coupled to the classes they refer to, they *do* trigger class loading (so as to benefit from the coupliing), and they do *not* accommodate nulls. (They *could* do so, if the loaded class gave permission, but such behavior is not supported today.)
Q-types are a two-edged sword. They are useful as clear signals to enable robust optimizations of flattening (in heap variables) and scalarization (on the stack). They are painful obstructions to migration, however. In order to ease migration, we are experimenting with "secret" optimization of value objects passed under L-types, using a `Preload` attribute to give an alternative hint to the VM that a variable can be optimized. But this preload tactic is advisory, not mandatory, which means that the JVM may be presented with inconsistent requests (due to multiple inheritance). Fixing this is a complex and open-ended problem, detailed (for example) inJDK-8301007.
## Proposal: Q-folding during linkage
A possible compromise between the clarity of Q-types and the flexibility (for migration) of L-types is to treate Q-types as annotated L-types, where a mismatch of the annotation does not prevent linkage from occurring. Thus, pre-migration code (which uses no Q-types to refer to a migrated objects) can continue to link with post-migration code (which may use a mix of Q-types and L-types in descriptors).
The simplest way to implement this is to declare, in the JVMS, that each field and each method (either a declaration or reference) has two descriptors, an original descriptor and a folded one. The original (or "unfolded") one is whatever occurs in the classfile. The folded one is derived from the unfolded one by converting all Q-types to their corresponding L-types.
The folding would pertain to all L-types embedded in the descriptor, including array element types. So `([[QFoo;LBar;)QBaz;` folds to `([[LFoo;LBar;)LBaz;`.
Then, most existing operations on descriptors (of fields and methods, declarations and references) would be adjusted to operate on the folded version. Operations on the folded descriptors would include:
- resolution of symbolic references to fields and methods (per 5.4.3.2 & 3)
- duplicate rejection (no `m()LFoo;` and `m()QFoo;` declared in the same class file, per 4.5 & 6)
- override matching (enforcement of `final` per 4.10.1.5, "v-table packing" per 5.4.5)
- derivation of verifier types (actually, unfolded would be OK as well)
- derivation of type mirrors that appear via core reflection (`Class`, `java.lang.reflect`)
Other uses of descriptors in the JVM's operations would continue to work on the original (unfolded) descriptor:
- initialization of fields (to value class default if Q-type instead of `null`)
- dynamic null-exclusion behavior of field and method *declarations*
- dynamic null-exclusion behavior of field *access* and method *invocation*
- dynamic null-exclusion behavior of dynamic call sites (Q-types in "indy")
- resolution-time null-exclusion of dynamic constants (Q-type "condy")
- dynamic null-exclusion behavior of method handle invocations
- array type descriptors (`[QFoo;` differs from `[LFoo;`)
- operand of `checkcast`, `instanceof` (can be a Q-type, as a special Valhalla feature)
- resolution of `CONSTANT_Class`, `CONSTANT_MethodType`, and `CONSTANT_MethodHandle` constants (Q-type mirrors differ from L-type mirrors)
- derivation of type mirrors that appear via `java.lang.invoke` APIs
Because Q-folding allows migrated and unmigrated code to link together, it follows necessarily that there may be different "Q-polarities" underlying the same folded descriptor in at least three places: declarations, uses, and overrides. The general rule is that a Q-type has to appear in only one of three places to force the JVM to exclude nulls. Those three places are symbolic reference, selected method, and any matching method in any super-class (or super-interface).
- If a declaration's original descriptor has a Q-type, that determines dynamic null-exclusion of the associated values, and typically requests flattened or scalarized representations. In the interpreter, there must be a null check on method entry/exit or field store/load. (If the field is physically null, that may be treated as denoting the value class's default value.)
- If a uses's original descriptor has a Q-type, that determines dynamic null-exclusion of the associated values *at the use site only*. In the interpreter, there must be a null check before and/or after the method invocation or field access.
- If a method overrides any method with a Q-type, the overriding method must dynamically exclude nulls, even if it has an L-type at the corresponding point. When the interpreter selects a method, it may need to inject a null check at the invocation, just as if the symbolic method reference mentioned a corresponding Q-type. Likewise, the interpreter may need to inject an extra null check at method entry/exit for a method which overrides a method that mentions a Q-type.
One subtle detail in this proposal is deciding when and how to distinguish Q-types from L-types in reflective APIs. For maximum compatibility, we propose exposing only folded descriptors via the existing core reflection APIs such as `Method::getReturnType`. A small number of new API points would expose the unfolded original descriptors, such as `Method::getMethodType` which would turn a `MethodType` containing possible Q-type mirrors. The folded version of this would be obtained by `MethodType::fold` which is a close cousin to the existing `MethodType::wrap`.
The method handle lookup API would accept unfolded type descriptors, but would (necessarily) always fold them, in order to align with the native bytecode resolution behavior. The dynamically apparent `MethodHandle::type` of the result would reflect the unfolded type (and so could demand extra null checks or unboxing). To recover the actually declared folded type, a `revealDirect` operation would be required.
Thus, the Core Reflection API (`Class`, `Method`, `Field`) will tilt towards folding to uniform L-types, while the method handle API (`MethodType`, `Lookup`) will tilt towards tracking Q-types.
## Alternatives
This proposal requires a lot of effort. But perhaps it is the best way to achieve the migration goals we are aiming at. Alternatives include:
- Aim for less migration compatibility.
- Have the Java translation strategy use only L-types but issue `Preload` commands; rely on the JVM to figure out where to "pencil in" the Q-type annotations.
- Discard Q-types and re-engineer all of their use cases with alternative mechanisms.
None of these alternatives seems more cost-effective than Q-folding. As a general principle, the JVM does better with more declarative input; Q-types provide such input. Q-folding makes Q-types harder to act upon profitably, but discarding the distinctions altogether (relying solely on `Preload`) does not make the problem easier, and very likely makes optimization less reliable.
## Future work
A possible follow-up to this work would be investigating remaining uses of Q-types to see if they "pull their weight" in the overall design. It is clear we could always define ad hoc "side channels" to replace any given single use of a Q-type. It is also clear that Q-types clearly and robustly communicate the presence of value types in a simple and unified manner. Time will tell if there is a better solution than Q-types.
Specifically, a side channel containing non-descriptor type restrictions (for any field or method descriptor component) can carry much of the information also carried by Q-types. Should we fast-forward to a type-descriptor design? Will we regret having to support Q-types if we some day also have type descriptors? It seems likely that building out type descriptors now, just to convey the Q-vs-L distinction, would tilt us toward a design which would be under-powered when we want to use type restrictions for reified generics. There are many more degrees of freedom in type restrictions than in the Q/L distinction. If we were building reified generics at the same time as value classes, we might unify the two mechanisms, but that seems impossible to do, without delaying value classes by several years.
Easy migration is a major goal of Valhalla, which is why we are creating value objects instead of non-object structs or the like. The linkage of value classes is (in most cases) indistinguishable from the linkage of legacy identity classes: Both use the some kinds of class names, and the same kinds of descriptors which we call "L-types" in the JVM, because they are spelled (in the JVM, not the language) like `Lpkg/Foo;`. To review: L-types are completely abstract and decoupled from the shape or nature of the named class, they do not trigger class loading, and they accommodate nulls.
To give the JVM a clear signal of where value objects might benefit from special processing, we also support a new descriptor for Q-types, like `Qpkg/Foo;`. Agian, to review: Q-types are closely coupled to the classes they refer to, they *do* trigger class loading (so as to benefit from the coupliing), and they do *not* accommodate nulls. (They *could* do so, if the loaded class gave permission, but such behavior is not supported today.)
Q-types are a two-edged sword. They are useful as clear signals to enable robust optimizations of flattening (in heap variables) and scalarization (on the stack). They are painful obstructions to migration, however. In order to ease migration, we are experimenting with "secret" optimization of value objects passed under L-types, using a `Preload` attribute to give an alternative hint to the VM that a variable can be optimized. But this preload tactic is advisory, not mandatory, which means that the JVM may be presented with inconsistent requests (due to multiple inheritance). Fixing this is a complex and open-ended problem, detailed (for example) in
## Proposal: Q-folding during linkage
A possible compromise between the clarity of Q-types and the flexibility (for migration) of L-types is to treate Q-types as annotated L-types, where a mismatch of the annotation does not prevent linkage from occurring. Thus, pre-migration code (which uses no Q-types to refer to a migrated objects) can continue to link with post-migration code (which may use a mix of Q-types and L-types in descriptors).
The simplest way to implement this is to declare, in the JVMS, that each field and each method (either a declaration or reference) has two descriptors, an original descriptor and a folded one. The original (or "unfolded") one is whatever occurs in the classfile. The folded one is derived from the unfolded one by converting all Q-types to their corresponding L-types.
The folding would pertain to all L-types embedded in the descriptor, including array element types. So `([[QFoo;LBar;)QBaz;` folds to `([[LFoo;LBar;)LBaz;`.
Then, most existing operations on descriptors (of fields and methods, declarations and references) would be adjusted to operate on the folded version. Operations on the folded descriptors would include:
- resolution of symbolic references to fields and methods (per 5.4.3.2 & 3)
- duplicate rejection (no `m()LFoo;` and `m()QFoo;` declared in the same class file, per 4.5 & 6)
- override matching (enforcement of `final` per 4.10.1.5, "v-table packing" per 5.4.5)
- derivation of verifier types (actually, unfolded would be OK as well)
- derivation of type mirrors that appear via core reflection (`Class`, `java.lang.reflect`)
Other uses of descriptors in the JVM's operations would continue to work on the original (unfolded) descriptor:
- initialization of fields (to value class default if Q-type instead of `null`)
- dynamic null-exclusion behavior of field and method *declarations*
- dynamic null-exclusion behavior of field *access* and method *invocation*
- dynamic null-exclusion behavior of dynamic call sites (Q-types in "indy")
- resolution-time null-exclusion of dynamic constants (Q-type "condy")
- dynamic null-exclusion behavior of method handle invocations
- array type descriptors (`[QFoo;` differs from `[LFoo;`)
- operand of `checkcast`, `instanceof` (can be a Q-type, as a special Valhalla feature)
- resolution of `CONSTANT_Class`, `CONSTANT_MethodType`, and `CONSTANT_MethodHandle` constants (Q-type mirrors differ from L-type mirrors)
- derivation of type mirrors that appear via `java.lang.invoke` APIs
Because Q-folding allows migrated and unmigrated code to link together, it follows necessarily that there may be different "Q-polarities" underlying the same folded descriptor in at least three places: declarations, uses, and overrides. The general rule is that a Q-type has to appear in only one of three places to force the JVM to exclude nulls. Those three places are symbolic reference, selected method, and any matching method in any super-class (or super-interface).
- If a declaration's original descriptor has a Q-type, that determines dynamic null-exclusion of the associated values, and typically requests flattened or scalarized representations. In the interpreter, there must be a null check on method entry/exit or field store/load. (If the field is physically null, that may be treated as denoting the value class's default value.)
- If a uses's original descriptor has a Q-type, that determines dynamic null-exclusion of the associated values *at the use site only*. In the interpreter, there must be a null check before and/or after the method invocation or field access.
- If a method overrides any method with a Q-type, the overriding method must dynamically exclude nulls, even if it has an L-type at the corresponding point. When the interpreter selects a method, it may need to inject a null check at the invocation, just as if the symbolic method reference mentioned a corresponding Q-type. Likewise, the interpreter may need to inject an extra null check at method entry/exit for a method which overrides a method that mentions a Q-type.
One subtle detail in this proposal is deciding when and how to distinguish Q-types from L-types in reflective APIs. For maximum compatibility, we propose exposing only folded descriptors via the existing core reflection APIs such as `Method::getReturnType`. A small number of new API points would expose the unfolded original descriptors, such as `Method::getMethodType` which would turn a `MethodType` containing possible Q-type mirrors. The folded version of this would be obtained by `MethodType::fold` which is a close cousin to the existing `MethodType::wrap`.
The method handle lookup API would accept unfolded type descriptors, but would (necessarily) always fold them, in order to align with the native bytecode resolution behavior. The dynamically apparent `MethodHandle::type` of the result would reflect the unfolded type (and so could demand extra null checks or unboxing). To recover the actually declared folded type, a `revealDirect` operation would be required.
Thus, the Core Reflection API (`Class`, `Method`, `Field`) will tilt towards folding to uniform L-types, while the method handle API (`MethodType`, `Lookup`) will tilt towards tracking Q-types.
## Alternatives
This proposal requires a lot of effort. But perhaps it is the best way to achieve the migration goals we are aiming at. Alternatives include:
- Aim for less migration compatibility.
- Have the Java translation strategy use only L-types but issue `Preload` commands; rely on the JVM to figure out where to "pencil in" the Q-type annotations.
- Discard Q-types and re-engineer all of their use cases with alternative mechanisms.
None of these alternatives seems more cost-effective than Q-folding. As a general principle, the JVM does better with more declarative input; Q-types provide such input. Q-folding makes Q-types harder to act upon profitably, but discarding the distinctions altogether (relying solely on `Preload`) does not make the problem easier, and very likely makes optimization less reliable.
## Future work
A possible follow-up to this work would be investigating remaining uses of Q-types to see if they "pull their weight" in the overall design. It is clear we could always define ad hoc "side channels" to replace any given single use of a Q-type. It is also clear that Q-types clearly and robustly communicate the presence of value types in a simple and unified manner. Time will tell if there is a better solution than Q-types.
Specifically, a side channel containing non-descriptor type restrictions (for any field or method descriptor component) can carry much of the information also carried by Q-types. Should we fast-forward to a type-descriptor design? Will we regret having to support Q-types if we some day also have type descriptors? It seems likely that building out type descriptors now, just to convey the Q-vs-L distinction, would tilt us toward a design which would be under-powered when we want to use type restrictions for reified generics. There are many more degrees of freedom in type restrictions than in the Q/L distinction. If we were building reified generics at the same time as value classes, we might unify the two mechanisms, but that seems impossible to do, without delaying value classes by several years.
- relates to
-
JDK-8303182 compressed Symbol pointers
-
- Closed
-
-
JDK-8301007 [lworld] Handle mismatches of the preload attribute in the calling convention
-
- Resolved
-