Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8344702

Flexible Constructor Bodies

XMLWordPrintable

    • Icon: JEP JEP
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • specification
    • None
    • Archie Cobbs & Gavin Bierman
    • Feature
    • Open
    • SE
    • amber dash dev at openjdk dot org

      Summary

      In the body of a constructor, allow statements to appear before an explicit constructor invocation, i.e., super(...) or this(...). These statements are restricted in that they cannot reference the instance under construction, but they can initialize its fields and perform other safe computations. This allows many constructors to be expressed more naturally, and it makes programs safer because fields can be initialized before they become visible to other code in the class (such as methods called from a superclass constructor).

      History

      Flexible Constructor Bodies were proposed as a preview feature by JEP 447 in JDK 22 under a different title. They were proposed again, with small enhancements, by JEP 482 in JDK 23. They were proposed for a third time, with no changes, by JEP 492 in JDK 24. We here propose to finalize the feature in JDK 25, with no changes from JDK 24.

      Goals

      • Remove unnecessary restrictions on code in constructors, so that developers can easily validate arguments before calling superclass constructors.

      • Provide new guarantees that the state of a new object is fully initialized before any code can use it.

      • Reimagine the process of how constructors interact with each other to achieve a fully initialized object.

      Motivation

      The constructors of a class are responsible for creating valid instances of the class. Typically, a constructor validates and transforms its arguments, then initializes the fields declared in its class to legitimate domain values. Furthermore, in the presence of subclassing, constructors of superclasses and subclasses have a shared responsibility for creating valid instances.

      For example, consider a Person class, with a subclass, Employee. Every Employee constructor will invoke, either implicitly or explicitly, a Person constructor, and the two constructors should work together to construct a valid instance. The Employee constructor is responsible for the fields declared in the Employee class, while the Person constructor is responsible for the fields declared in the Person class. Since code in the Employee constructor can refer to fields declared in the Person class, it is only safe for the Employee constructor to access these fields after the Person constructor has finished assigning values to them.

      The Java language ensures construction of valid instances by running constructors from the top down: A constructor in a superclass runs before a constructor in a subclass. To achieve this, the Java language currently requires that the very first statement in a constructor is a constructor invocation -- super(...) or this(...) -- and if no such statement exists, then the Java compiler inserts one (super()). As a result of the superclass constructor running first, fields declared in the superclass are initialized before fields declared in the subclass. In the previous example, the Person constructor runs in its entirety before the Employee constructor begins to validate its arguments, which means the Employee constructor can assume that the Person constructor has properly initialized the fields of Person.

      Constructors are too restrictive

      The top-down rule for constructors provides predictability in the creation of valid instances but it outlaws some familiar and reasonable programming patterns. Developers are often frustrated by the inability to write code in constructors that is perfectly safe. For example, suppose our Person class has an age field, but that employees are required to be between the ages of 18 and 67 years old. In the Employee constructor, we would like to validate an age argument before passing it to the Person constructor. But the constructor invocation must come first. We can validate the argument afterwards but that means potentially doing unnecessary work:

      class Person {
          int age;
          ...
          Person(..., int age) {
              ...
              this.age = age;
              if (age < 0) throw new IllegalArgumentException(...);
          }
      }
      class Employee extends Person {
          Employee(..., int age) {
              super(..., age);    // Potentially unnecessary work
              if (age < 18 || age > 67) throw new IllegalArgumentException(...);
          }
      }

      It would be better to declare an Employee constructor that fails fast, by validating its argument before invoking the Person constructor. This is clearly safe but as the constructor invocation must come first the only way to fail fast is to call an auxiliary method in-line, as part of the constructor invocation:

      class Employee extends Person {
          Employee(..., int age) {
              super(..., verifyAge(age));
          }
          private static int verifyAge(int value) {
              if (age < 18 || age > 67) throw new IllegalArgumentException(...);
              return value;
          }
      }

      Other common scenarios run into this issue where the superclass constructor invocation must come first. For example, we might need to perform some non-trivial computation to prepare the arguments for a superclass constructor invocation. Or we might need to prepare a complex value to be shared amongst several arguments of a superclass constructor invocation. The restriction on constructor bodies that they must begin with a constructor invocation often limits expressiveness of constructors.

      Superclass constructors can violate the integrity of subclasses

      Each class has an idea of valid state for its own fields, and would like to achieve such state regardless of the actions of its superclasses, subclasses, and all other classes in the program. That is, the state of each class has integrity.

      The top-down rule ensures that a superclass constructor is always run before the subclass constructor, ensuring that fields of the superclass are initialized properly. Unfortunately, the rule is not sufficient to ensure the integrity of the new instance as a whole. The superclass constructor can, indirectly, access fields in the subclass before the subclass constructor has had a chance to initialize them. For example, suppose the Employee class has an officeID field, and the constructor in Person calls a method which is overridden in Employee:

      class Person {
          int age;
      
          Person(..., int age) {
              ...
              if (age < 0) throw new IllegalArgumentException("negative age");
              this.age = age;
              logPersonalData();
          }
      
          void logPersonalData() {
              System.out.println("Age: " + this.age);
          }
      }
      
      class Employee extends Person {
          String officeID;
      
          Employee(..., int age, String officeID) {
              super(age);     // Potentially unnecessary work
              if (age < 18  || age > 67) throw new IllegalArgumentException("invalid working age");
              this.officeID = officeID;
          }
      
          @Override
          void logPersonalData() {
              System.out.println("Age: " + this.age);
              System.out.println("Office: " + this.officeID);
          }
      }

      What does new Employee(..., 42, "CAM-FORA") log? It might be expected to print Age: 42, and perhaps additionally Office: CAM-FORA, but it actually prints Age: 42 and Office: null! This is because the Person constructor is invoked before the officeID field can be initialized by the Employee constructor. The Person constructor then calls logPersonalData, causing the overriding method in Employee to run, all before the Employee constructor has had a chance to initialize the officeID field to "CAM-FORA". As a result, the logPersonalData method prints the default value of the field, which is null.

      This behavior violates the integrity of the Employee class, in that its fields can be accessed before they can be initialized to a valid domain-specific value by the constructor. Even final fields in Employee can be accessed before they are initialized to their final values, so code can readily observe the mutation of final fields due to the behavior of superclass constructors.

      In this particular example it is due to the fact that constructors can invoke overridable methods. Whilst it is considered bad practice -- Item 19 of Effective Java advises that "Constructors must not invoke overridable methods" -- it is not uncommon, and is a source of many subtle real-world bugs and errors. But this is just one example of such behavior; there is nothing to stop, for example, a superclass constructor passing the current instance to another method that would be able to access subclass fields before they have been assigned values by the subclass constructor. There is currently very little that a class can do to defend itself against this kind of behavior by its own superclasses and other code.

      Towards More Expressiveness and Safety

      In summary, if the Java language had more flexible rules for constructor bodies then not only would constructors be easier to write and maintain but classes could protect their integrity in the sense that their fields could be initialized to domain-specific values before they can be accessed by other code in the program.

      Description

      We propose to remove the simplistic syntactic requirement, enforced since Java 1.0, that every constructor body begins with a constructor invocation (super(..) or this(..)). This allows us to write more readable constructors that validate their argument before the superclass constructor invocation. For example, our Employee constructor from above could be written directly and more clearly to fail-fast, as follows:

      class Employee extends Person {
          String officeID;
      
          Employee(..., int age, String officeID) {
              if (age < 18  || age > 67) throw new IllegalArgumentException("invalid working age"); // Now fails fast!
              super(..., age);
              this.officeID = officeID;
          }
          ...
      }

      Moreover, we can ensure that subclass constructors protect their integrity from, for example, superclass constructors, by initializing their fields before a superclass constructor invocation. For example, we could further rewrite the Employee constructor to initialize the officeID field before invoking the superclass constructor:

      class Employee extends Person {
          String officeID;
      
          Employee(..., int age, String officeID) {
              if (age < 18  || age > 67) throw new IllegalArgumentException("invalid working age"); // Now fails fast!
              this.officeID = officeID;   // Initialize before calling superclass constructor!
              super(..., age);
          }
          ...
      }

      Now, new Employee(..., 42, "CAM-FORA") will print Age: 42 and Office: CAM-FORA as expected, the integrity of the Employee class has been enforced.

      A new model for constructor bodies

      This simple change in the requirements for constructors actually represents a completely new model for constructor bodies. In this new model, a constructor body has two distinct phases: The prologue is the code before the constructor invocation, and the epilogue is the code after the constructor invocation.

      To illustrate, consider this class hierarchy:

      class Object {
          Object() {
              // Object constructor body
          }
      }
      
      class A extends Object {
          A() {
              super();
              // A constructor body
          }
      }
      
      class B extends A {
          B() {
              super();
              // B constructor body
          }
      }
      
      class C extends B {
          C() {
              super();
              // C constructor body
          }
      }
      
      class D extends C {
          D() {
              super();
              // D constructor body
          }
      }

      Currently, when creating a new instance of class D, via new D(), the execution of the constructor bodies can be visualized as:

      D
      --> C
          --> B
              --> A
                  --> Object constructor body
              --> A constructor body
          --> B constructor body
      --> C constructor body
      D constructor body

      This is why the Java language's current approach to safe object initialization is characterized as being top-down: The constructor bodies are run starting at the top of the hierarchy, with the class Object, moving down one-by-one through the subclasses.

      When constructor bodies have both a prologue and an epilogue, we can generalize the class declarations:

      class Object {
          Object() {
              // Object constructor body
          }
      }
      
      class A extends Object {
          A() {
              // A prologue
              super();
              // A epilogue
          }
      }
      
      class B extends A {
          B() {
              // B prologue
              super();
              // B epilogue
          }
      }
      
      class C extends B {
          C() {
              // C prologue
              super();
              // C epilogue
          }
      }
      
      class D extends C {
          D() {
              // D prologue
              super();
              // D epilogue
          }
      }

      The corresponding execution of the constructor bodies when evaluating new D() can be visualized as:

      D prologue
      --> C prologue
          --> B prologue
              --> A prologue
                  --> Object constructor body
              --> A epilogue
          --> B epilogue
      --> C epilogue
      D epilogue

      This new approach, rather than running the constructor bodies top-down, first runs the prologues bottom-up and then runs the epilogues top-down. This allows us to reimagine how valid instances can be safely created, where the prologues can ensure that the state for each subclass is validated and assigned from the bottom-up before the epilogues can execute from the top-down safe in the knowledge that the state is valid, meaning that they can freely use the instance being created.

      Syntax

      We revise the current grammar of a constructor body to allow statements before an explicit constructor invocation, that is, from:

      ConstructorBody:
          { [ExplicitConstructorInvocation] [BlockStatements] }

      to:

      ConstructorBody:
          { [BlockStatements] ExplicitConstructorInvocation [BlockStatements] }
          { [BlockStatements] }

      Eliding some details, an explicit constructor invocation is either super(..) or this(..).

      The statements that appear before an explicit constructor invocation constitute the prologue of the constructor body.

      The statements that appear after an explicit constructor invocation constitute the epilogue of the constructor body.

      An explicit constructor invocation in a constructor body may be omitted. In this case the prologue is empty, the invocation super() (an invocation of the constructor of the direct superclass that takes no arguments) will be considered to implicitly appear at the beginning of the constructor body, and all the statements in the constructor body will be taken to constitute the epilogue.

      A return statement is permitted in the epilogue of a constructor body if it does not include an expression. That is, return; is allowed but return e; is not. It is a compile-time error for a return statement to appear in the prologue of a constructor body.

      Throwing an exception in the prologue or epilogue of a constructor body is permitted. Throwing an exception in the prologue will be typical in fail-fast scenarios.

      Early construction contexts

      Currently, in the Java language, code that appears in the argument list of an explicit constructor invocation is said to appear in a

      <em>static<br /> context</em>

      . This means that the arguments to the explicit constructor invocation are treated as if they were code in a static method; in other words, as if no instance is available. The technical restrictions of a static context are stronger than necessary, however, and they prevent code that is useful and safe from appearing as constructor arguments.

      Rather than revise the concept of a static context, we introduce the concept of an early construction context that covers both the argument list of an explicit constructor invocation and any statements that appear before it in the constructor body, i.e., in the prologue. Code in an early construction context must not use the instance under construction, except to initialize fields that do not have their own initializers.

      This means that any explicit or implicit use of this to refer to the current instance, or to access fields or invoke methods of the current instance, is disallowed in an early construction context:

      class X {
      
          int i;
      
          X() {
      
              System.out.print(this);  // Error - refers to the current instance
      
              var x = this.i;          // Error - explicitly refers to field of the current instance
              this.hashCode();         // Error - explicitly refers to method of the current instance
      
              var x = i;               // Error - implicitly refers to field of the current instance
              hashCode();              // Error - implicitly refers to method of the current instance
      
              super();
      
          }
      
      }

      Similarly, any field access, method invocation, or method reference qualified by super is disallowed in an early construction context:

      class Y {
          int i;
          void m() { ... }
      }
      
      class Z extends Y {
      
          Z() {
              var x = super.i;         // Error
              super.m();               // Error
              super();
          }
      
      }

      Using enclosing instances in early construction contexts

      When class declarations are nested, the code of an inner class can refer to the instance of an enclosing class. This is because the instance of the enclosing class is created before the instance of the inner class. The code of the inner class — including constructor bodies — can access fields and invoke methods of the enclosing instance, using either simple names or

      qualified <code class="prettyprint" data-shared-secret="1742041599028-0.6139130632034904">this</code><br /> expressions

      . Accordingly, operations on an enclosing instance are permitted in an early construction context.

      In the code below, the declaration of Inner is nested in the declaration of Outer, so every instance of Inner has an enclosing instance of Outer. In the constructor of Inner, code in the early construction context can refer to the enclosing instance and its members, either via simple names or via Outer.this.

      class Outer {
      
          int i;
      
          void hello() { System.out.println("Hello"); }
      
          class Inner {
      
              int j;
      
              Inner() {
                  var x = i;             // OK - implicitly refers to field of enclosing instance
                  var y = Outer.this.i;  // OK - explicitly refers to field of enclosing instance
                  hello();               // OK - implicitly refers to method of enclosing instance
                  Outer.this.hello();    // OK - explicitly refers to method of enclosing instance
                  super();
              }
      
          }
      
      }

      By contrast, in the constructor of Outer shown below, code in the early construction context cannot instantiate the Inner class with new Inner(). This expression is really this.new Inner(), meaning that it uses the current instance of Outer as the enclosing instance for the Inner object. Per the earlier rule, any explicit or implicit use of this to refer to the current instance is disallowed in an early construction context.

      class Outer {
      
          class Inner {}
      
          Outer() {
              var x = new Inner();       // Error - implicitly refers to the current instance of Outer
              var y = this.new Inner();  // Error - explicitly refers to the current instance of Outer
              super();
          }
      
      }

      Early assignment to fields

      Accessing fields of the current instance is disallowed in an early construction context, but we have seen in the introduction that allowing assignment to fields of the current instance in an early construction context allows a class to defend itself against its uninitialized fields being visible to other code.

      In a constructor body, a

      simple<br /> assignment

      to a field declared in the same class is allowed in an early construction context, provided the field declaration lacks an initializer. This means that a constructor body can initialize the class's own fields in an early construction context, but not the fields of a superclass.

      A constructor body cannot read any of the fields of the current instance — whether declared in the same class as the constructor, or in a superclass — until after the explicit constructor invocation, i.e., in the epilogue.

      Records

      Constructors of record<br /> classes

      are already subject to more restrictions than constructors of normal classes. In particular,

      • Canonical record constructors must not contain any explicit constructor invocation, and

      • Non-canonical record constructors must contain an alternate constructor invocation (this(..)) and not a superclass constructor invocation (super(..)).

      These restrictions remain. Otherwise, record constructors will benefit from the changes described above, primarily because non-canonical record constructors will be able to contain statements before the alternative constructor invocation.

      Enums

      Constructors of enum<br /> classes

      can contain alternate constructor invocations but not superclass constructor invocations. Enum classes will benefit from the changes described above, primarily because their constructors will be able to contain statements before the alternate constructor invocation.

      Testing

      • We will test the compiler changes with existing unit tests, unchanged except for those tests that verify changed behavior, plus new positive and negative test cases as appropriate.

      • We will compile all JDK classes using the previous and new versions of the compiler and verify that the resulting bytecode is identical.

      • No platform-specific testing should be required.

      Risks and Assumptions

      The changes we propose above are source- and behavior-compatible. They strictly expand the set of legal Java programs while preserving the meaning of all existing Java programs.

      These changes, though modest in themselves, represent a significant change in how constructors participate in safe object initialization. They relax the long-standing requirement that a constructor invocation, if present, must always appear as the first statement in a constructor body. This requirement is deeply embedded in code analyzers, style checkers, syntax highlighters, development environments, and other tools in the Java ecosystem. As with any language change, there may be a period of pain as tools are updated.

      Dependencies

      Flexible constructor bodies in the Java language depend on the ability of the JVM to verify and execute arbitrary code that appears before constructor invocations in constructors, so long as that code does not reference the instance under construction. Fortunately, the JVM already supports a more flexible treatment of constructor bodies:

      • Multiple constructor invocations may appear in a constructor body provided on any code path there is exactly one invocation;

      • Arbitrary code may appear before constructor invocations so long as that code does not reference the instance under construction except to assign fields; and

      • Explicit constructor invocations may not appear within a try block, i.e., within a bytecode exception range.

      The JVM's rules still ensure safe object initialization:

      • Superclass initialization always happens exactly once, either directly via a superclass constructor invocation or indirectly via an alternate constructor invocation; and

      • Uninitialized instances are off-limits except for field assignments, which do not affect outcomes, until superclass initialization is complete.

      As a result, this proposal does not include any changes to the Java Virtual Machine Specification, only to the Java Language Specification.

      The existing mismatch between the JVM, which allows flexible constructor bodies, and the Java language, which is more restrictive, is an historical artifact. Originally the JVM was more restrictive, but this led to issues with the initialization of compiler-generated fields for new language features such as inner classes and captured free variables. To accommodate compiler-generated code, the JVM Specification was relaxed many years ago, but the Java Language Specification was never revised to leverage this new flexibility.

      JEP 401 proposes Value Classes for the Java Platform and builds upon this JEP; in fact it proposes a different treatment for constructors of value classes without an explicit constructor invocation. In this case, an implicit constructor will be inserted at the end instead of the start of the constructor body. This means that the statements in a constructor body with no explicit constructor invocation are considered to form the prologue of the constructor, and the epilogue is considered to be empty.

            gbierman Gavin Bierman
            gbierman Gavin Bierman
            Gavin Bierman Gavin Bierman
            Alex Buckley, Brian Goetz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: