Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8350458

Strict Field Initialization in the JVM (Preview)

XMLWordPrintable

    • Icon: JEP JEP
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • hotspot
    • None
    • Feature
    • Open
    • SE
    • valhalla dash dev at openjdk dot java dot net
    • M
    • L

      Summary

      Support a JVM flag on fields expressing that the field follows a stricter initialization discipline. Enforce this discipline during verification and at run time, and provide optional run-time diagnostics for fields that have not set the flag.

      Motivation

      Whenever a class is loaded by the JVM, it needs to be initialized. Each class can declare a special class initialization method, named <clinit>, for this purpose. The class initialization method is free to execute arbitrary code, and what constitutes an "initialized" class is up to the discretion of the class author. Usually class initialization includes setting all of the class's static fields to an appropriate initial value.

      Similarly, whenever a class instance is created with the new bytecode, that instance needs to be initialized. Each class can declare multiple special instance initialization methods, named <init>, for this purpose. These methods are also free to execute arbitrary code, and through a chain of <init> method invocations every class in an inheritance hierarchy can define what constitutes an "initialized" class instance, at the discretion of each class author. Usually instance initialization includes setting all of the instance's non-static fields to an appropriate initial value.

      Sometimes, during the initialization process, the fields of the class or instance are accessed. These fields may not yet have been set to an initial value, and can be thought of as being in a temporary larval state. The longstanding behavior of the JVM is to give larval fields a default valuenull for references and 0 for primitives—to be returned if they are read before being properly initialized. An early access is often a bug, but execution simply proceeds with the given value.

      For example, given the following classes (styled as Java code for readability), if the class App is initialized first, no error occurs, but log messages will be prefixed with an unexpected identifier, App[0]: ....

      class App {
      
          public static final long appID;
      
          static void <clinit>() {
              appID = Log.currentPID();
          }
      }
      
      class Log {
      
          private static final String prefix;
      
          static void <clinit>() {
              prefix = "App[%d]: ".formatted(App.appID);
          }
      
          public static void log(String msg) {
              System.out.println(prefix + msg);
          }
      
          public static long currentPID() {
              return ProcessHandle.current().pid();
          }
      }

      Notice that the circular dependency between classes App and Log is not obvious or essential; if the utility method currentPID were declared in some other class, the dependency would be broken and everything would behave properly. In complex systems, these sorts of bugs can be very hard to diagnose.

      In this example, it is especially bad that the appID field exposes its larval value, because it is a final field. If other classes later refer to appID, they'll see a different identifier than the one observed by Log. An apparently-immutable variable has been observed to mutate, with unpredictable results.

      In the Java language, there are limited compile-time restrictions that prevent some obvious bugs, like a forward reference to a field of the same class, or a failure to assign a value to a final field. There are also warnings that discourage virtual method calls in constructors. The JVM has its own guardrails, such as preventing concurrent access to a class while it is being initialized.

      But it would be impossible to prevent all bugs that arise from an unwanted interaction with a class or instance that has not yet completed its initialization process. Classes are at risk once their <clinit> code begins running (including due to indirect events like the loading or initialization of other classes). Instances are at risk once an <init> method shares the current instance with some other code (including the <init> method of a superclass). Yet until these methods return, the class or instance is not fully initialized.

      That said, we can do a better job of ensuring that the fields of the class or instance have been assigned a value before they are read, and—if final—not subsequently mutated. This JEP introduces a new JVM field modifier that allows bytecode generators to opt in to these stricter requirements.

      By opting in, fields get two important integrity guarantees:

      • First, the default value of the field will never be read.

        Future JVM enhancements anticipate types that have no appropriate default value. It is difficult to support such types for traditional JVM fields, but trivial for properly-initialized fields: it simply doesn't matter what data the fields store before they are initialized.

      • Second, if the field is final, it will never be observed to mutate. This unlocks many useful optimizations.

        In the future, the JVM will depend on this invariant to effectively optimize its encodings of value objects.

      Description

      Class and instance initialization

      Field initialization interacts closely with the class and instance initialization processes. This section provides some background on these processes in the JVM.

      Class initialization

      A class is initialized by a class initialization method, <clinit>. Class initialization methods typically set up the static fields of the class, and might also interact with other global state. Each class in a hierarchy may have its own <clinit> method, and every superclass must be initialized before executing the <clinit> method of a subclass.

      An initialization state is used to keep track of the initialization status of each class at run time. In today's JVM (see JVMS 5.5), a class's initialization state may be any of the following:

      • Uninitialized: the class is loaded but has not yet attempted initialization
      • Larval in a particular thread: the class is currently being initialized
      • Initialized: the class has successfully completed initialization
      • Erroneous: the class failed initialization and may not be used

      The <clinit> method executes while the class is in a larval state. The class is not yet initialized at this point, but its fields and methods can be freely accessed by code running in the current thread. If the <clinit> method completes successfully, the class transitions to the initialized state. If an exception occurs, the class transitions to the erroneous state and can never become initialized.

      The constraints on class initialization are enforced dynamically, at run time. For example, each getstatic instruction is responsible for checking the initialization state of the resolved field's class, and if the class is in the larval state in another thread, blocking until initialization completes.

      Instance initialization

      An object is initialized by an instance initialization method, <init>. Instance initialization methods typically set up the instance fields of the class, and might also interact with static fields and global state. Each class in a hierarchy has at least one <init> method, and that method must, somewhere in its body, delegate to another <init> method of the current class or its superclass. (This recursion bottoms out at Object.<init>.)

      Like classes, objects have an initialization state, although this is expressed only indirectly in the JVM Specification. Today, an object's initialization state may be any of the following:

      • Uninitialized: the object has been created by new, but has not yet attempted initialization

      • Early larval: the object is currently being initialized, and execution has not yet reached Object.<init>

      • Late larval: the object is currently being initialized, and the Object.<init> method has been reached

      • Initialized: the object has successfully completed initialization

      • Erroneous: the object failed initialization and may not be used

      An <init> method begins execution in the early larval state. Most operations, including method invocations, are not allowed on an object in the early larval state, and it may not be shared with other code. However, its fields may be assigned with putfield. At some point another <init> method is invoked and the initialization process continues recursively, eventually reaching Object.<init>. At that point, the instance transitions to the late larval state and, one by one, delegating <init> methods complete their execution and return. In the late larval state, the object is not yet fully initialized, but use of its fields and methods is unrestricted. The object is initialized once the outermost <init> method returns successfully. Alternatively, any <init> call in the stack might throw an exception; in that case, the object transitions to the erronous state and can never become initialized.

      The constraints on instance initialization are enforced statically, by the verifier. Every instruction in an <init> method is associated with either the early larval or late larval state. (This is expressed through a combination of the uninitializedThis type and a flagThisUninit flag in the type state of the instruction.) Most operations on the current object are prohibited in the early larval state, and the late larval state can only be reached via a successful delegating <init> call. The return instruction is only allowed in the late larval state. Verification for non-<init> methods does not attempt to distinguish between the late larval and initialized states—they are interchangeable.

      Strictly-initialized fields

      Under this JEP, in a preview class file (version XX.65535), a new modifier, ACC_STRICT (0x0800), may be set in the access_flags item of a class file's field declaration. This indicates that the field is strictly-initialized. (Before Java SE 17, this flag was applied to methods to indicate a requirement for "strict" floating-point semantics. That capability was removed by JEP 306. The ACC_STRICT flag has never been applied to fields.)

      Strictly-initialized fields must be initialized—that is, they must be assigned an initial value—during the larval phase of class initialization and the early larval phase of instance initialization. These fields have no observable default value, and may not be read until their initial assignment has occurred.

      These constraints are enforced by enhancing the representation of the larval and early larval initialization states to track whether each field has been set. Then for static fields, new checks are performed dynamically during class initialization; for instance fields, new checks are performed statically by the verifier.

      The following rules apply during the class initialization process to ensure all strictly-initialized static fields are properly initialized:

      • If a strictly-initialized static field has a ConstantValue attribute, then at the point in the process where that value is assigned to the field, the larval state is updated to reflect that the field is set.

      • When executing a putstatic instruction, if the resolved field is strictly-initialized and declared by a class in a larval state, the state is updated to reflect that the field is set. (This occurs no matter where the putstatic instruction appears, and no matter what class is referenced.)

      • When executing a getstatic instruction, if the resolved field is strictly-initialized and declared by a class in a larval state, then if the state does not reflect that the field is set, an exception is thrown. (This occurs no matter where the getstatic instruction appears, and no matter what class is referenced.)

      • After the execution of a <clinit> method completes normally (or, if no <clinit> method is declared, at the point where it would have been invoked), the initialization process checks that each strictly-initialized static field of the class has been set. If so, the class can transition to the initialized state; if not, the class transitions to the erroneous state and an exception is thrown.

      The following rules apply during verification of an <init> method to ensure all strictly-initialized instance fields are properly initialized:

      • A putfield on the current class instance in an early larval state updates the state to reflect that the named field has been set.

      • An invokespecial of an <init> method, applied to the current class instance in an early larval state, requires that if the invocation is of a superclass method, the state must reflect that all strictly-initialized instance fields have been set.

      (It is not allowed to perform getfield on the current class instance in an early larval state.)

      Stable final fields

      A strictly-initialized final field must not be observed to mutate—all reads must have the same value. Assignments to the field are only allowed during the larval phase of class initialization and the early larval phase of instance initialization.

      In some complex cases, such as due to exception handling, a final field may be written multiple times during initialization. This is allowed, but any intermediate values of the field cannot be read.

      During class initialization:

      • When executing a getstatic instruction, if the resolved field is strictly-initialized, final, and declared by a class in a larval state, then the state is updated to reflect that the field has been read. (This implies an additional piece of metadata to track in the larval state.)

      • When executing a putstatic instruction, if the resolved field is strictly-initialized, final, and declared by a class in a larval state, then if the state reflects that the field has been read, an exception is thrown.

      During verification of an <init> method:

      • A putfield instruction writing to a strictly-initialized final field of the current class is only allowed when the initialization state is early larval. (The status of the field is irrelevant, because no reads can occur in an early larval state.)

      Verification enhancements

      The verification changes that augment the instance initialization state introduce some new requirements and opportunities, discussed in this section.

      Initialization states in StackMapTable

      The verifier requires that for every jump in a method, including every implicit jump to an exception handler, the type state of the jump target must be compatible with the incoming type state at the point of the jump. Each jump target declares its type state in the StackMapTable attribute.

      Because initialization state is part of the type state, and because jumps are anticipated to occur in the early larval code of <init> methods, it is necessary to enhance the StackMapTable attribute to be able to express various initialization states, including enumerating the fields that have been set in an early larval state.

      (Alternatively, we could require all field-setting code to occur immediately before the delegating <init> method invocation, without any jumps. But this would be an inconvenient restriction.)

      In today's JVM, the initialization state of a stack frame in the StackMapTable is implicit: if the stack frame's local variables mention the type uninitializedThis, the initialization state is early larval; otherwise, the initialization state is late larval.

      This rule can continue to be applied with this JEP, where the implicit early larval state is the state in which no fields have been set.

      To express other early larval states, we need something explicit in the StackMapTable. This could be a new kind of frame that asserts which fields should be considered unset in subsequent early larval frames.

      assert_early_unset_fields {
          u1 frame_type;
          u2 number_of_unset_fields;
          u2 unset_fields[number_of_unset_fields];
      }

      This could also be expressed as a frame that wraps and modifies a "base frame".

      (We could enumerate the set fields or the unset fields. In either case, each entry in the array is a constant pool reference to a CONSTANT_NameAndType identifying a strictly-initialized instance field of the current class.)

      In the future (perhaps not in this JEP), it may also be useful to allow the StackMapTable to declare a stack frame in the erroneous state.

      erroneous_frame {
          u1 frame_type;
          stack_map_frame base_frame;
      }

      The late larval and initialized states never need to be expressed explicitly.

      Initialization state transitions

      A jump target can act as a join point for multiple execution paths, and the incoming initialization state from these two paths may differ. For example, a field may be assigned on one path, and not assigned on another.

      The early larval state should be understood to track which fields are guaranteed to be set; a possibly-set field is expressed just like an unset field. A jump can always transition from one early larval state to another as long as the transition only "unsets" some fields. A jump can also transition to the erroneous state. Initialization state transitions to an early larval state with more fields set, and to the late larval state, can only be achieved via the putfield and invokespecial instructions, respectively.

      As a legacy issue, the design of verification anticipated that, in an <init> method, certain exception handlers might want to catch exceptions thrown by a <init> invocation and execute in the erroneous state. But for subtle reasons it is impossible to write such code today (leading to a long, unsatisfying bug tail). This issue could be addressed by supporting an explicit erroneous state in the StackMapTable.

      More generally, transitioning to an erroneous state is useful when there is a need to join early larval and late larval code paths. Since it is impossible to transition out of the erroneous state, code in that state is obligated to throw an exception or loop forever.

      Reflective initialization

      Various libraries allow fields to be assigned and read without using bytecode instructions. These include java.lang.reflect.Field, java.lang.invoke.MethodHandle, and java.lang.invoke.VarHandle.

      For static fields, assignments and reads expressed through library code perform the same checks and have the same effects on the class initialization state as putstatic and getstatic, as described above:

      • Field assignments update the class initialization state, if necessary, to indicate that a strictly-initialized field has been set. This operation throws an exception if the field is final and, per the initialization state, no longer allows assignments.

      • Field reads update the class initialization state, if necessary, to indicate that a strictly-initialized field has been read. The operation throws an exception if, per the initialization state, the field has not been set.

      For instance fields, note that the verifier prevents libraries from interacting with objects in the early larval state. Until the object reaches the late larval phase, the initialization state of the object may only be manipulated with bytecode instructions in an <init> method. This means that a strictly-initialized field cannot be assigned its initial value by a library, and a strictly-initialized final field cannot be mutated by a library.

      This restriction on instance fields is inconvenient for tools that perform their own object initialization for user-defined classes, but is necessary to support the invariants of strictly-initialized fields. These tools must, necessarily, cooperate with the class's <init> methods to initialize any strictly-initialized fields.

      Some standard libraries could be used to circumvent the constraints on strictly-initialized instance fields, and so require changes:

      • Standard object deserialization is implemented with special permission to skip the usual execution of an <init> method in the class being instantiated. This capability bypasses the verification-based enforcement of constraints on strictly-initialized instance fields, and must not be used for classes that declare these fields.

        Instead, ObjectOutputStream.writeObject and ObjectInputStream.readObject throw an InvalidClassException if a class being serialized or deserialized declares a strictly-initialized instance field (and the class is not a record class).

      • The Field.setAccessible method allows clients to bypass the final restriction on most instance fields, enabling mutation in the late larval and initialized states. A strictly-initialized final field cannot support this behavior, and should always be treated as non-modifiable by the setAccessible method.

      Static field initialization diagnostics

      To enable adoption of strictly-initialized fields, it will be helpful for developers to be able to diagnose the initialization of fields in their existing code.

      For instance fields, such diagnostics are best implemented at compile time in the source language, and are outside the scope of this JEP.

      For static fields, developers can activate initialization diagnostics in HotSpot with the flag -XX.... When activated, all static fields are tracked in the class initialization state. Whenever any non-strict static field is read before it has been initialized or (in the case of a final field) mutated after it has been read, a diagnostic is generated.

      The command-line flag specifies whether the diagnostic takes the form of a fatal error or an event logged to the console and JFR.

      Testing

      The ACC_STRICT flag is not a language feature, but it is often convenient to write HotSpot tests in Java code.

      For testing, this JEP will introduce a test library to:

      • Provide a @Strict annotation to be placed on strictly-initialized fields; and

      • Generate class files from Java sources in a two-step process, first compiling with javac, and then rewriting the bytecode to apply the ACC_STRICT flag and adjust initialization timing.

      Other changes

      The Field.accessFlags and Field.getModifiers methods should reflect the presence of ACC_STRICT. (AccessFlag.STRICT and Modifier.STRICT already exist, so there is no need to define new constants.)

      The java.lang.classfile API should support ACC_STRICT and the StackMapTable changes. When a StackMapTable is automatically generated, it should properly encode the initialization state of strictly-initialized fields.

      The javap tool should properly display the ACC_STRICT modifier and StackMapTables.

      The asmtools tools should similarly be updated to support ACC_STRICT and StackMapTable.

      Internal JVM optimizations may use the ACC_STRICT flag to reason about the timing of potential changes to a final field's value. Other tools and APIs may also depend on it for their own analyses.

      Alternatives

      • In JDK 21, the javac compiler added warnings to discourage invocations of instance methods from superclass constructors (see JDK-8299995). Such warnings are helpful, but of course are no substitute for invariants enforced by the JVM.

      • We've considered approaches that enforce instance field invariants with dynamic checks. These would allow more flexibility in the timing of instance field initialization. Unfortunately, they require a run-time overhead that is not easily optimized away once the object has been fully initialized.

      Dependencies

      Value Classes and Objects builds on this JEP, marking all the fields of value classes ACC_STRICT.

            dlsmith Dan Smith
            dlsmith Dan Smith
            Dan Smith Dan Smith
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: