Loading...

Type: JEP
Resolution: Unresolved
Priority: P4
Fix Version/s: None
Component/s: hotspot
Labels:
None

JEP Type:
Feature
Exposure:
Open
Subcomponent:
runtime
Scope:
SE
Discussion:
valhalla dash dev at openjdk dot org
Effort:
M
Duration:
L

Summary

Support a JVM flag on fields expressing that the field must be properly initialized before it can be read. Enforce this requirement during verification and at run time, and provide optional run-time diagnostics for static fields that have not set the flag.

Goals

Allow class files to opt in to a more structured initialization discipline for fields, ensuring that the fields are always set before they are read and, if final, never modified after they are read.
Enable run-time optimizations for these fields by enforcing the initialization discipline via verification-time and run-time checks. Enhance the StackMapTable attribute as necessary to express field initialization status during construction.
Provide tools to diagnose initialization bugs releated to static fields, even when those fields have not opted in to the new discipline.

Non-Goals

It is not a goal to introduce any new Java language features, such as a new modifier for fields. The new JVM flag is intended to be set by compilers, when appropriate, not by programmers.
It is not a goal to change javac compilation strategies in order to adopt the flag in the bytecode of existing Java programs. The new JVM feature is generally meant to be adopted only when future programs make use of new language features.

Motivation

Whenever a class is loaded by the JVM, it needs to be initialized. In bytecode, each class and interface can declare a special class initialization method, named <clinit>, for this purpose. The class initialization method is free to execute arbitrary code, and what constitutes an "initialized" class is up to the discretion of the class author. Usually class initialization includes setting all of the class's static fields to an appropriate initial value.

Similarly, whenever a class instance is created with the new bytecode, that instance needs to be initialized. In bytecode, each class can declare multiple special instance initialization methods, named <init>, for this purpose. These methods are also free to execute arbitrary code, and through a chain of <init> method invocations every class in an inheritance hierarchy can define what constitutes an "initialized" class instance, at the discretion of each class author. Usually instance initialization includes setting all of the instance's non-static fields to an appropriate initial value.

In Java, class initialization methods are not written directly, but are an aggregation of each class's static field initializers and static initializer blocks. Instance initialization methods are mainly expressed with constructors, and delegation between constructors is expressed with super(...) or this(...) calls.

During the execution of an initialization method, the class or instance being initialized should be regarded as being in a temporary larval state, not yet ready to participate fully in its designed functions. For example, if the program attempts to read a field of a larval class or instance, it may encounter an unexpected value.

The longstanding behavior of the JVM is to create all fields with a default value—null for references and 0 for primitives—to be returned if they are read before being properly initialized. Sometimes the default value happens to be appropriate to the design of the class (in which case we may simply regard the author of the class as having implicitly initialized the field to the chosen default value). But all too often, the default value is not appropriate, and a read of the uninitialized field is a bug. Execution will simply proceed with the unwanted default value, leading to unpredictable results, such as the notorious NullPointerException.

To illustrate: given the following classes (styled as Java code for readability), if the class App is initialized first, no exception will be thrown, but log messages will be prefixed with an unexpected identifier, App[0]: .... This is because the invocation of Log.currentPID() implicitly triggers initialization of the Log class, and the <clinit> method of Log then reads the uninitialized appID field, embedding its value 0 into the prefix string. Eventually, the currentPID() call will proceed, and appID will be properly initialized with the current process's ID number; but that will be too late for the prefix field.

class App {

    public static final long appID;

    static void <clinit>() {
        appID = Log.currentPID();
    }
}

class Log {

    private static final String prefix;

    static void <clinit>() {
        prefix = "App[%d]: ".formatted(App.appID);
    }

    public static void log(String msg) {
        System.out.println(prefix + msg);
    }

    public static long currentPID() {
        return ProcessHandle.current().pid();
    }
}

Notice that the circular dependency between classes App and Log is not obvious or essential; if the utility method currentPID were declared in some other class, the dependency would be broken and everything would behave properly. In complex systems, these sorts of bugs can be very hard to recognize and diagnose.

In this example, it is especially bad that the appID field exposes its default value, because it is a final field. If other classes later refer to appID, they'll see a different identifier than the one observed by Log. An apparently-immutable variable has been observed to mutate, with unpredictable results.

In the Java language, there are limited compile-time restrictions on field initialization that prevent some obvious bugs, like a forward reference to a field of the same class, or a failure to assign a value to a final field. javac also provides optional warnings that discourage virtual method calls in constructors. The JVM has its own guardrails, such as preventing concurrent access to a class while it is being initialized.

But the Java language and JVM are not designed to prevent all bugs that could arise from an unwanted interaction with a class or instance that is not fully initialized. A larval class is at risk of misuse once its <clinit> method begins running, either due to explicit delegation to external code, or implicit execution of external code via JVM activities like class loading and initialization. A larval instance is at risk of misuse once an <init> method shares a reference to the instance with external code—and note that individual classes cannot prevent superclasses from doing so.

That said, while we can't prevent all misuses of larval classes and instances, we can improve the situation by focusing on field initialization. Ideally, every class or instance initializer should satisfy two important invariants: first, that its fields have been assigned a value before they can be read; and second, that its final fields will not be subsequently mutated.

Under these rules, the above attempt to read App.appID in its larval state would not be permitted, and developers would realize that they needed to address the failure by restructuring the two classes.

Most programs respect these rules, most of the time. But we should do better, enlisting the JVM to perform automatic checks that ensure that (for participating code) the rules are enforced rigorously.

Future versions of the Java language and VM will depend on rigorous enforcement of these initialization invariants, as they support features like value objects, which depend on reliable final fields, and non-null variables, which are incompatible with a null default value.

Description

In JDK NN, this JEP introduces a new flag for fields in a class file, ACC_STRICT_INIT. This flag is a preview VM feature, disabled by default and only allowed in classes with a preview class file version number (XX.65535). To load preview class files at run time, you must enable preview features:

java --enable-preview Main

The implications of the new flag are described below; but first, the next two sections provide some background on the class and instance initialization processes in the JVM.

Class initialization

A class is initialized by a class initialization method, <clinit>. Class initialization methods typically set up the static fields of the class, and might also interact with other global state. Each class in a hierarchy may have its own <clinit> method, and every superclass must be initialized before executing the <clinit> method of a subclass.

An initialization state is used to keep track of the initialization status of each class at run time. In today's JVM (see JVMS 5.5), a class's initialization state may be any of the following:

Uninitialized: The class is loaded but has not yet attempted initialization.
Larval (within a particular thread): The class is currently being initialized.
Initialized: The class has successfully completed initialization, and can be used without restriction.
Erroneous: The class failed initialization and may not be used.

The <clinit> method executes while the class is in a larval state. The class is not yet initialized at this point, but its fields and methods can be freely accessed by code running in the current thread. If the <clinit> method completes successfully, the class transitions to the initialized state. If an exception occurs, the class transitions to the erroneous state and can never become initialized.

The constraints on class initialization are enforced dynamically, at run time. For example, each getstatic instruction is responsible for checking the initialization state of the resolved field's class. If the class is not initialized, but is in the larval state in another thread, getstatic blocks until initialization completes.

Instance initialization

An object is initialized by an instance initialization method, <init>. Instance initialization methods typically set up the instance fields of the class, and might also interact with static fields and global state. Each class in a hierarchy has at least one <init> method, and that method must, at some point before it completes, delegate to another <init> method of the current class or its superclass. (This recursion bottoms out at Object.<init>.)

Like classes, objects have an initialization state, although this is expressed only indirectly in the JVM Specification. Today, an object's initialization state may be any of the following:

Uninitialized: The object has been created by new, but has not yet attempted initialization.
Early larval: The object is currently being initialized, and execution has not yet reached Object.<init>.
Late larval: The object is currently being initialized, and the Object.<init> method has been reached.
Initialized: The object has successfully completed initialization, and can be used without restriction.
Erroneous: The object failed initialization and may not be used.

An <init> method begins execution in the early larval state. Most operations, including method invocations, are not allowed on an object in the early larval state, and the object may not be shared with other code. However, its fields may be assigned with putfield. At some point another <init> method is invoked and the initialization process continues recursively, eventually reaching Object.<init>. At that point, the instance transitions to the late larval state and, one by one, the recursively invoked <init> methods complete their execution and return. In the late larval state, the object is not yet fully initialized, but use of its fields and methods is unrestricted. (It may even be shared across threads.) The object is initialized once the outermost <init> method returns successfully. Alternatively, any <init> call in the stack might fail with an exception; in that case, the object transitions to the erronous state and can never become initialized.

The constraints on instance initialization are enforced statically, by the verifier. Verification does not attempt to distinguish between the late larval and initialized states—both can be considered unrestricted. Every instruction in an <init> method is associated with either the early larval state or unrestricted. (This is tracked through the flagThisUninit flag in the instruction's type state.) The verifier prevents most operations on the current object in the early larval state, and ensures the unrestricted state can only be reached via a chain of recursively delegating <init> calls that eventually reaches Object.<init>. The return instruction, which makes the newly constructed object available to the caller of <init>, is only allowed in the unrestricted state.

Strictly-initialized fields

The new ACC_STRICT_INIT (0x0800) flag may be set in the access_flags item of any field declaration. This indicates that the field is strictly-initialized. (Before Java SE 17, the ACC_STRICT flag, also 0x0800, was applied to methods to indicate a requirement for "strict" floating-point semantics. That capability was removed by JEP 306. The two flags are unrelated.)

Strictly-initialized fields must be initialized—that is, they must be assigned an initial value—during the larval phase of class initialization and the early larval phase of instance initialization. These fields have no observable default value, and may not be read until their initial assignment has occurred.

These constraints are enforced by enhancing the representation of the larval and early larval initialization states to track whether each field has been set. Every time a strictly-initialized field is set, the initialization state is updated to reflect the field's status. Then for static fields, new checks are performed dynamically during class initialization; for instance fields, similar checks are performed statically by the verifier.

The following rules apply during the class initialization process to ensure all strictly-initialized static fields are properly initialized:

If a strictly-initialized static field has a ConstantValue attribute, then at the point in the process where that value is assigned to the field, the larval state is updated to reflect that the field is set.
When executing a putstatic instruction, if the resolved field is strictly-initialized and declared by a class in a larval state in the current thread, the state is updated to reflect that the field is set. (This occurs even if the field is written from another method or class, and even if the field is referenced as a member of a subclass.)
When executing a getstatic instruction, if the resolved field is strictly-initialized and declared by a class in a larval state in the current thread, then if the state does not reflect that the field is set, an exception is thrown, indicating that the field cannot yet be read. (This occurs even if the field is read from another method or class, and even if the field is referenced as a member of a subclass.)
After the execution of a <clinit> method completes normally (or, if no <clinit> method is declared, at the point where it would have been invoked), the initialization process checks that each strictly-initialized static field of the class has been set. If so, the class can transition to the initialized state; if not, the class transitions to the erroneous state and an exception is thrown.

The following rules apply during verification of an <init> method to ensure all strictly-initialized instance fields are properly initialized:

A putfield on the current class instance in an early larval state updates the state to reflect that the named field has been set.
An invokespecial of an <init> method, applied to the current class instance in an early larval state, requires that if the invocation is of a superclass method, the state must reflect that all strictly-initialized instance fields have been set. (If the invocation is of another <init> method of the same class, there is no such requirement—the invoked method is responsible for setting the fields.)

There is no rule for getfield analogous to the getstatic rule for static fields, because it has never been permitted to use getfield on an instance in the early larval state.

Restrictions on final fields

A strictly-initialized final field must never be observed to mutate—all reads must have the same value. Assignments to such a field are only allowed during the larval phase of class initialization and the early larval phase of instance initialization.

In some complex cases, such as due to exception handling, a final field may be written multiple times during initialization. This is allowed, but only the ultimate value of the field will be readable.

The following rules apply during class initialization:

When executing a getstatic instruction, if the resolved field is strictly-initialized, final, and declared by a class in a larval state in the current thread, then if the field is successfully read, the state is updated to reflect this fact. (This implies an additional piece of metadata to track in the larval state.)
When executing a putstatic instruction, if the resolved field is strictly-initialized, final, and declared by a class in a larval state in the current thread, then if the state reflects that the field has been read, an exception is thrown, indicating that the field can no longer be set.

The following rule applies during verification of an <init> method:

A putfield instruction writing to a strictly-initialized final field of the current class is only allowed when the initialization state is early larval. (The status of the field is irrelevant, because no reads can occur in an early larval state.)

In contrast, putfield is allowed throughout the body of an <init> method for final fields that are not strictly-initialized.

Verification enhancements

Some additional changes to class file verification are necessary to account for early larval initialization states that keep track of field initialization.

One of the requirements of verification is that for every jump in a method, including every implicit jump to an exception handler, the type state of the jump target must be compatible with the incoming type state at the point of the jump. Each jump target declares its type state in the StackMapTable attribute.

Because initialization state is part of the type state, and because jumps may occur in the early larval code of <init> methods, it is necessary to enhance the StackMapTable attribute to be able to express early larval initialization states, including enumerating the fields that have been set in an early larval state.

(Alternatively, we could require all field-setting code to occur immediately before the delegating <init> method invocation, without any jumps. But this would be an inconvenient restriction.)

Entries in the StackMapTable are typically expressed as modifications of the previous frame. In an <init> method, the initialization state of the implicit first stack map frame is early larval, where all ACC_STRICT_INIT instance fields declared by the class are considered unset.

From that point:

A frame that has uninitializedThis as the type of one of its local variables is in an early larval state, with the same set of unset fields as the previous frame.
A frame that does not have uninitializedThis as the type of one of its local variables is in an unresticted state (and all fields are considered set).
A frame may explicitly express an early larval state with a new kind of frame, early_larval_frame:
```
early_larval_frame {
    u1 frame_type = EARLY_LARVAL; /* 246 */
    u2 number_of_unset_fields;
    u2 unset_fields[number_of_unset_fields];
    base_stack_map_frame base_frame;
}
```
This frame wraps a base_frame with an explicit assertion of the initialization state, including a list of unset fields (given by NameAndType constants).

A jump target can act as a join point for multiple execution paths, and the incoming initialization state from these two paths may differ. For example, a field may be assigned on one path, and not assigned on another.

The early larval state should be understood to track which fields are guaranteed to be set; a possibly-unset field is expressed just like a definitely-unset field. A jump can always transition from one early larval state to another as long as the transition only "unsets" some fields. Initialization state transitions to an early larval state with more fields set, or to the unrestricted state, can only be achieved via the putfield and invokespecial instructions, respectively.

It is not possible for a jump to join early larval and unrestricted code paths, or to handle the erroneous state that occurs when a delegating invokespecial throws an exception. This is a longstanding limitation (with a messy bug tail); it could be addressed in the future by supporting an explicit erroneous_frame in the StackMapTable.

Reflective initialization

Various libraries allow fields to be assigned and read without using bytecode instructions. These include java.lang.reflect.Field, java.lang.invoke.MethodHandle, and java.lang.invoke.VarHandle.

For strictly-initialized static fields, assignments and reads expressed through library code perform the same checks and have the same effects in the larval class initialization state as putstatic and getstatic, as described above:

Field assignments update the class initialization state, as needed, to indicate that a strictly-initialized field has been set. If the field is final, this operation throws an exception when, per the initialization state, the field no longer allows assignments.
Field reads throw an exception if, per the initialization state, a strictly-initialized field has not been set. If the field is final, this operation updates the class initialization state, as needed, to indicate that the field has been read.

For strictly-initialized instance fields, note that the verifier prevents libraries from interacting with objects in the early larval state. Until the object reaches the late larval phase, the initialization state of the object may only be manipulated with bytecode instructions in an <init> method. This means that a strictly-initialized field cannot be assigned its initial value by a library, and a strictly-initialized final field cannot be mutated by a library.

This restriction on instance fields is inconvenient for tools that perform their own object initialization for user-defined classes, but is necessary to support the invariants of strictly-initialized fields. These tools must, necessarily, cooperate with the class's <init> methods to initialize any strictly-initialized fields.

Some standard libraries require changes to ensure they cannot be used to circumvent the constraints on strictly-initialized instance fields:

Standard object deserialization is implemented with special permission to skip the usual execution of an <init> method in the class being instantiated. This capability bypasses the verification-based enforcement of constraints on strictly-initialized instance fields, and must not be used for classes that declare these fields.

Instead, ObjectOutputStream.writeObject and ObjectInputStream.readObject throw an InvalidClassException if a class being serialized or deserialized declares a strictly-initialized instance field (and the class is not a record class).

Users of serialization can implement the writeReplace and readResolve methods to avoid this exception. Doing so causes a replacement object to be serialized and deserialized instead of the object that declares strictly-initialized fields.

(A future enhancement to serialization is anticipated, allowing class authors to declare special constructors that ObjectInputStream.readObject can use to create new instances from the data in serialization streams.)
The Field.setAccessible method allows clients to bypass the final restriction on most instance fields, enabling mutation in the late larval and initialized states. A strictly-initialized final field cannot support this behavior, and should always be treated as non-modifiable by the setAccessible method.

(Relatedly, Prepare to Make Final Mean Final provides for warnings when reflection is used to mutate final fields that are not marked strictly-initialized.)

Run-time optimization of strictly-initialized fields

The invariants of fields marked with ACC_STRICT_INIT provide the JVM with opportunities to optimize uses of those fields at run time.

For example, in JDK NN HotSpot's JIT compiler treats strictly-initialized final fields as trusted. A trusted final field is known to never change, so once a value has been read from it, subsequent reads can re-use that same value.

Thus, in the following loop, if this.size is strictly-initialized and final, the size value that gets read at the start of the loop can reliably be re-used in the bounds check after each iteration, without worrying that doSomething() may have had the side effect of mutating size.

for (int i = 0; i < this.size; i++) {
    doSomething();
}

The resulting JIT-compiled code has fewer interactions with memory and may execute faster.

Static field initialization diagnostics

The runtime checks performed for strictly-initialized static fields will often be useful even for fields that are not marked with ACC_STRICT_INIT. As a debugging tool, HotSpot can provide class initialization diagnostics via the command-line flag -XX:CheckAllStaticsStrictly=[warn|error|jfr] or -Xlog:strict+static=warning.

When these diagnostics are turned on, all static fields are tracked by the class initialization state. Whenever any non-strict static field is read before it has been initialized or, in the case of a final field, mutated after it has been read, a diagnostic is generated.

The command-line flag specifies whether the diagnostic takes the form of a fatal error or an event logged to the console and JFR.

Testing

The ACC_STRICT_INIT flag is not a language feature, but it is often convenient to write HotSpot tests in Java code.

For testing, this JEP introduces an OpenJDK test library to:

Define a @StrictInit annotation that can be placed on fields that should be treated as strictly-initialized; and
Generate class files from Java sources in a two-step process, first compiling with javac, and then rewriting the bytecode to apply the ACC_STRICT_INIT flag and adjust the initialization timing of <init> methods.

Supporting changes

The Field.accessFlags and Field.getModifiers methods should reflect the presence of ACC_STRICT_INIT.

The java.lang.classfile API should support ACC_STRICT_INIT and early_larval_frame entries in StackMapTable. When a StackMapTable is automatically generated, it should properly encode the initialization state of strictly-initialized fields.

The javap tool should properly display the ACC_STRICT_INIT modifier and early_larval_frames; it should also do a better job of presenting the implicit initialization states in a StackMapTable.

The asmtools tools should similarly be updated to support ACC_STRICT_INIT and early_larval_frame.

Internal JVM optimizations may use the ACC_STRICT_INIT flag to reason about the timing of potential changes to a final field's value. Other tools and APIs may also depend on the flag for their own analyses.

Alternatives

In JDK 21, the javac compiler added warnings to discourage invocations of instance methods from superclass constructors (see JDK-8299995). Such warnings are helpful, but of course are no substitute for invariants enforced by the JVM.
We've considered approaches that enforce instance field invariants with dynamic checks. These would allow more flexibility in the timing of instance field initialization. Unfortunately, they require a run-time overhead that is not easily optimized away once the object has been fully initialized.

Dependencies

Value Classes and Objects builds on this JEP, marking all the fields of value classes ACC_STRICT_INIT, and encouraging programming patterns in Java that initialize fields in the early larval phase.

blocks

JDK-8367935 Rename ACC_STRICT in the JVM according to strict fields JEP

Open

1.

JVM implementation of strict field initialization

Open

Matias Saavedra Silva

Details

Description

Summary

Goals

Non-Goals

Motivation

Description

Class initialization

Instance initialization

Strictly-initialized fields

Restrictions on final fields

Verification enhancements

Reflective initialization

Run-time optimization of strictly-initialized fields

Static field initialization diagnostics

Testing

Supporting changes

Alternatives

Dependencies

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates