Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8251554

JEP 401: Value Classes and Objects (Preview)

XMLWordPrintable

    • Icon: JEP JEP
    • Resolution: Unresolved
    • Icon: P3 P3
    • None
    • specification
    • None
    • Feature
    • Open
    • SE
    • valhalla dash dev at openjdk dot java dot net
    • XL
    • XL
    • 401

      Summary

      Enhance the Java Platform with value objects, class instances that have only final fields and lack object identity. This is a preview language and VM feature.

      Goals

      • Allow developers to opt in to a programming model for domain values in which objects are distinguished solely by the values of their fields, much as the int value 3 is distinguished from the int value 4.

      • Migrate popular classes that represent domain values in the standard library, such as Integer and LocalDate, to this programming model. Support compatible migration of user-defined classes.

      • Maximize the freedom of the JVM to store domain values in ways that improve memory footprint, locality, and garbage collection efficiency.

      Non-Goals

      • It is not a goal to automatically treat existing classes as value classes, even if they meet the requirements for how value classes are declared and used. The behavioral changes require an explicit opt-in.

      • It is not a goal to "fix" the == operator so that programmers can use it in place of equals. This JEP redefines == only as much as necessary to cope with a new kind of identity-free object. The usual advice to compare objects in most contexts using the equals method still applies.

      • It is not a goal to introduce a struct feature in the Java language. Java programmers are not asked to understand new semantics for memory management or variable storage. Java continues to operate on just two kinds of data: primitives and object references.

      • It is not a goal to change the treatment of primitive types. Primitive types behave like value classes in many ways, but are a distinct concept. A separate JEP will provide enhancements to make primitive types more class-like and compatible with generics.

      • It is not a goal to guarantee any particular optimization strategy or memory layout. This JEP enables many potential optimizations; only some will be implemented initially. Some potential optimizations, such as layouts that exclude null, will only be possible after future language and JVM enhancements.

      Motivation

      Java developers often need to represent domain values: the date of an event, the color of a pixel, the shipping address of an order, and so on. Developers usually model these values with immutable classes that contain just enough business logic to construct, validate, and transform instances. Notably, the toString, equals, and hashCode methods are defined so that equivalent instances can be used interchangeably.

      As an example, event dates can be represented with instances of the <code class="prettyprint" data-shared-secret="1753352346501-0.818012515550435">LocalDate</code> JDK class:

      jshell> LocalDate d1 = LocalDate.of(1996, 1, 23)
      d1 ==> 1996-01-23
      
      jshell> LocalDate d2 = d1.plusYears(30)
      d2 ==> 2026-01-23
      
      jshell> LocalDate d3 = d2.minusYears(30)
      d3 ==> 1996-01-23
      
      jshell> d1.equals(d3)
      $4 ==> true

      Developers will regard the "essence" of a LocalDate object as its year, month, and day values. But to Java, the essence of any object is its identity. Each time a method in LocalDate invokes new LocalDate(...), an object with a unique identity is allocated, distinguishable from every other object in the system.

      The easiest way to observe the identity of an object is with the == operator:

      jshell> d1 == d3
      $6 ==> false

      Even though d1 and d3 represent the same year-month-day triple, they are two objects with distinct identities.

      Object identity is problematic for domain values

      For mutable objects, identity is important: it lets us distinguish two objects that have the same state now but will have different state in the future. For example, suppose a class Customer has a field lastOrderedDate that is mutated when the customer makes a new order. Two Customer objects might have the same lastOrderedDate, but it would be a coincidence; when a customer makes a new order, the application will mutate the lastOrderedDate of one object but not the other, relying on identity to pick the right one.

      In other words, when objects are mutable, they are not interchangeable. But most domain values are not mutable and are interchangeable. There is no practical difference between two LocalDate objects representing 1996-01-23, because their state is fixed and unchanging. They represent the same domain value, both now and in the future. There is no need to distinguish the two objects via their identities.

      In fact, object identity is often harmful if the objects in question have immutable state and are meant to be interchangeable. This is because it can cause significant confusion when developers accidentally stumble on it, discovering that d1 == d3 is false.

      For JDK classes that model primitive values, such as Integer, the JDK uses a cache to avoid creating objects with unique identities. However, this cache, somewhat arbitrarily, does not extend to four-digit Integer values like 1996:

      jshell> Integer i = 96, j = 96;
      i ==> 96
      j ==> 96
      
      jshell> i == j
      $3 ==> true
      
      jshell> Integer x = 1996, y = 1996;
      x ==> 1996
      y ==> 1996
      
      jshell> x == y
      $6 ==> false

      For domain values like Integer, the fact that each object has unique identity is unwanted complexity that leads to surprising behavior and exposes incidental implementation choices. This extra complexity could be avoided if the language did not insist that separately-created but interchangeable objects have distinct identities.

      Object identity is expensive at run time

      Java's requirement that every object have identity, even if some domain values don't want it, is a performance impediment. Typically, the JVM has to allocate memory for each newly created object, distinguishing it from every object already in the system, and then reference that memory location whenever the object is used or stored.

      For example, a program might need to create arrays of int values or LocalDate references.

      jshell> int[] ints = { 1996, 2006, 1996, 1, 23 }
      ints ==> int[5] { 1996, 2006, 1996, 1, 23 }
      
      jshell> LocalDate[] dates = { d1, d1, d2, null, d3 }
      dates ==> LocalDate[5] { 1996-01-23, 1996-01-23, 2026-01-23,
                               null, 1996-01-23 }

      An array of int values can be represented by the JVM as a simple block of memory:

      +----------+
      | int[5]   |
      +----------+
      | 1996     |
      | 2006     |
      | 1996     |
      | 1        |
      | 23       |
      +----------+

      Meanwhie, an array of LocalDate references must be represented as a sequence of pointers, each referencing a memory location where an object has been allocated.

      
      +--------------+
      | LocalDate[5] |
      +--------------+
      | 87fa1a09     | -----------------------> +-----------+
      | 87fa1a09     | -----------------------> | LocalDate |
      | 87fb4ad2     | ------> +-----------+    +-----------+
      | 00000000     |         | LocalDate |    | y=1996    |
      | 87fb5366     | ---     +-----------+    | m=1       |
      +--------------+   |     | y=2026    |    | d=23      |
                         v     | m=1       |    +-----------+
              +-----------+    | d=23      |
              | LocalDate |    +-----------+
              +-----------+
              | y=1996    |
              | m=1       |
              | d=23      |
              +-----------+

      Even though the data modeled by an array of LocalDate values is not significantly more complex than an array of int values (a year-month-day triple is, effectively, 48 bits of primitive data), the memory footprint is far greater.

      Worse, when a program iterates over the LocalDate array, each pointer may need to be dereferenced. CPUs use caches to enable fast access to chunks of memory; if the array exhibits poor memory locality (a distinct possibility if the LocalDate objects were allocated at different times or out of order), every dereference may require caching a different chunk of memory, frustrating performance.

      In some application domains, developers routinely program for speed by creating as few objects as possible, thus de-stressing the garbage collector and improving locality. For example, they might encode event dates with an int representing an epoch day. Unfortunately, this approach gives up the functionality of classes that makes Java code so maintainable: meaningful names, private state, data validation by constructors, convenience methods, etc. A developer operating on dates represented as int values might accidentally interpret the value in terms of a starting date in 1601 or 1980 rather than the intended 1970 start date.

      Programming without identity

      Trillions of Java objects are created every day, each one bearing a unique identity. We believe the time has come to let Java developers choose which objects in the program need identity, and which do not. A class like LocalDate that represents domain values could opt out of identity, so that it would be impossible to distinguish between two LocalDate objects representing the date 1996-01-23, just as it is impossible to distinguish between two int values representing the number 4.

      By opting out of identity, developers are opting in to a programming model that provides the best of both worlds: the abstraction of classes with the simplicity and performance benefits of primitives.

      Description

      Java programs manipulate objects through references. A reference to an object is stored in a variable and lets us find the object's fields, which are merely variables that store primitive values or references to other objects. Traditionally, a reference also encodes the unique identity of an object: each execution of new allocates a fresh object and returns a unique reference that can be stored in one variable and copied to other variables (aliasing). Famously, the == operator compares objects by comparing references, so references to two objects are not == even if the objects have identical field values.

      JDK NN introduces value objects to model immutable domain values. A reference to a value object is stored in a variable and lets us find the object's fields, but it does not serve as the unique identity of the object. Executing new might not allocate a fresh object and might instead return a reference to an existing object, or even a "reference" that embodies the object directly. The == operator compares value objects by comparing their field values, so references to two objects are == if the objects have identical field values.

      A value object is an instance of a value class, declared with the value modifier. Classes without the value modifier are called identity classes, and their instances are identity objects.

      Developers can save memory and improve performance by using value objects for immutable data. Because programs cannot tell the difference between two value objects with identical field values (not even with ==), the Java Virtual Machine is able to avoid allocating multiple objects for the same data. Furthermore, the JVM can change how a value object is laid out in memory without affecting the program; for example, its fields could be stored on the stack rather than the heap.

      The following sections explore how value objects differ from identity objects and illustrate how to declare value classes. This is followed by an in-depth treatment of the special behaviors of value objects, considerations for value class declarations, and the JVM's handling of value classes and objects.

      Enabling preview features

      Value classes and objects are a preview language feature, disabled by default.

      To try the examples below in JDK NN you must enable preview features:

      • Compile the program with javac --release NN --enable-preview Main.java and run it with java --enable-preview Main; or,

      • When using the source code launcher, run the program with java --enable-preview Main.java; or,

      • When using jshell, start it with jshell --enable-preview.

      Programming with value objects

      In Java NN, with preview features enabled, 29 classes in the JDK are declared as value classes. These include:

      • In java.lang: Integer, Long, Float, Double, Byte, Short, Boolean, and Character

      • In java.util: Optional, OptionalInt, OptionalLong, and OptionalDouble

      • In java.time: LocalDate, LocalTime, Instant, Duration, LocalDateTime, OffsetDateTime, and ZonedDateTime

      All instances of these classes are value objects. This includes the boxed primitives that are instances of Integer, Long, etc. The == operator compares value objects by their field values, so, e.g., Integer objects are == if they box the same primitive values:

      % -> jshell --enable-preview
      |  Welcome to JShell -- Version 25-internal
      |  For an introduction type: /help intro
      
      jshell> Integer x = 1996, y = 1996;
      x ==> 1996
      y ==> 1996
      
      jshell> x == y
      $3 ==> true

      Similarly, two LocalDate objects are == if they have the same year, month, and day values:

      jshell> LocalDate d1 = LocalDate.of(1996, 1, 23)
      d1 ==> 1996-01-23
      
      jshell> LocalDate d2 = d1.plusYears(30)
      d2 ==> 2026-01-23
      
      jshell> LocalDate d3 = d2.minusYears(30)
      d3 ==> 1996-01-23
      
      jshell> d1 == d3
      $7 ==> true

      The String class has not been made a value class. Instances of String are always identity objects. We can use the Objects.hasIdentity method to observe whether an object is an identity object.

      jshell> String s = "abcd"
      s ==> "abcd"
      
      jshell> Objects.hasIdentity(s)
      $9 ==> true
      
      jshell> Objects.hasIdentity(x)
      $10 ==> false
      
      jshell> String t = "aabcd".substring(1)
      t ==> "abcd"
      
      jshell> s == t
      $13 ==> false

      In most respects, value objects work the way that objects have always worked in Java. However, a few identity-sensitive operations, such as synchronization, are not supported by value objects.

      jshell> synchronized (d1) { d1.notify(); }
      |  Error:
      |  unexpected type
      |    required: a type with identity
      |    found:    java.time.LocalDate
      |  synchronized (d1) { d1.notify(); }
      |  ^--------------------------------^
      
      jshell> Object o = d1
      o ==> 1996-01-23
      
      jshell> synchronized (o) { o.notify(); }
      |  Exception java.lang.IdentityException: Cannot synchronize on
         an instance of value class java.time.LocalDate
      |        at (#19:1)

      The JVM has a lot of freedom to encode references to value objects at run time in ways that optimize memory footprint, locality, and garbage collection efficiency. For example, we saw the following array earlier, implemented with pointers to heap objects:

      jshell> LocalDate[] dates = { d1, d1, d2, null, d3 }
      dates ==> LocalDate[5] { 1996-01-23, 1996-01-23, 2026-01-23,
                               null, 1996-01-23 }

      Now that LocalDate objects lack identity, the JVM could implement the array using "references" that encode the fields of each LocalDate directly. Each array component can be represented as a 64-bit word that indicates whether the reference is null, and if not, directly stores the year, month, and day field values of the value object:

      +--------------+
      | LocalDate[5] |
      +--------------+
      | 1|1996|01|23 |
      | 1|1996|01|23 |
      | 1|2026|01|23 |
      | 0|0000|00|00 |
      | 1|1996|01|23 |
      +--------------+

      The performance characteristics of this LocalDate array are similar to an ordinary int array:

      +----------+
      | int[5]   |
      +----------+
      | 1996     |
      | 2006     |
      | 1996     |
      | 1        |
      | 23       |
      +----------+

      Some value classes, like LocalDateTime, are too large to take advantage of this particular technique. But the lack of identity enables the JVM to optimize references to those objects in other ways.

      Declaring value classes

      Developers can declare their own value classes by applying the value modifier to any class whose instances are:

      • Immutable: all instance fields of the class can be final, and the value represented by an instance does not change over time

      • Interchangeable: it's not necessary to distinguish between two instances that represent the same value

      There is no restriction on the type of a value class's fields—they may store references to other value objects, or to identity objects like strings.

      Record classes are transparent data carriers whose fields are always final, and so they are often great candidates to be value classes.

      jshell> value record Point(int x, int y) {}
      |  created record Point
      
      jshell> Point p = new Point(17, 3)
      p ==> Point[x=17, y=3]
      
      jshell> Objects.hasIdentity(p)
      $7 ==> false
      
      jshell> new Point(17, 3) == p
      $8 ==> true

      Many other classes represent immutable and interchangeable values, but may not be suitable to be record classes—for example, because their private internal state differs from their public external state. When the value modifier is applied to these classes, their fields are implicitly made final. The class itself cannot be extended and is implicitly made final as well.

      The LazySubstring value class, below, represents a substring of a string lazily, without allocating a char[] in memory. Its internal state consists of three fields: a source string and two character indices. Its external state is the string that is lazily computed from those fields. The class overrides the toString, equals, and hashCode methods to model this external state.

      value class LazySubstring {
          private String str;
          private int start, end;
      
          public LazySubstring(String s, int i, int j) {
              str = s; start = i; end = j;
          }
      
          public String toString() {
              return str.substring(start, end);
          }
      
          public boolean equals(Object o) {
              return o instanceof LazySubstring &&
                  toString().equals(o.toString());
          }
      
          public int hashCode() {
              return Objects.hash(LazySubstring.class, toString());
          }
      }

      For final classes with final fields, applying or removing the value keyword is a binary-compatible change. It is also source-compatible in most respects, exception where a program attempts to apply a synchronized statement to an expression of the class's type.

      Substitutability

      The == operator tests whether two objects are substitutable. This means that identical operations performed on the two objects will always produce identical results—it is impossible for a program to distinguish between the two.

      For an identity object, this can only be true for the object itself: o1 == o2 only if o1 and o2 have the same unique identity.

      For a value object, this is true whenever the two objects are instances of the same class and have substitutable field values. Primitive field values are considered substitutable if they have the same bit patterns; reference field values are compared recursively with ==.

      The == operator does not necessarily align with intuitions about whether two objects represent the same value. Usually, the right way to compare objects—whether they have identity or not—is with equals.

      For example, two LazySubstring instances may have identical external state—that is, they represent the same lazily-computed string—but differ in their internal state. The programmer knows that these two objects represent equivalent values, as determined by equals; but the two are not substitutable.

      jshell> LazySubstring sub1 = new LazySubstring("ringing", 1, 4);
      sub1 ==> ing
      
      jshell> LazySubstring sub2 = new LazySubstring("ringing", 4, 7);
      sub2 ==> ing
      
      jshell> sub1.equals(sub2)
      $3 ==> true
      
      jshell> sub1 == sub2
      $4 ==> false

      As another example, value object fields that reference distinct identity objects are not substitutable, even if those identity objects are the same according to equals.

      jshell> value record Country(String code) {}
      |  created record Country
      
      jshell> Country c1 = new Country("SWE")
      c1 ==> Country[code=SWE]
      
      jshell> Country c2 = new Country("SWEDEN".substring(0,3))
      c2 ==> Country[code=SWE]
      
      jshell> c1.equals(c2)
      $8 ==> true
      
      jshell> c1 == c2
      $9 ==> false

      Even floating-point primitive fields may store distinct primitive NaN values that are treated the same by equals (and by most floating-point operations), but are not substitutable. This is because the Float.floatToRawIntBits operation makes it possible to tell the two apart, and if the fields were considered substitutable, the result of that operation would be unpredictable.

      jshell> value record Length(float val) {}
      |  created record Length
      
      jshell> Length l1 = new Length(Float.intBitsToFloat(0x7ff80000))
      l1 ==> Length[val=NaN]
      
      jshell> Length l2 = new Length(Float.intBitsToFloat(0x7ff80001))
      l2 ==> Length[val=NaN]
      
      jshell> l1.equals(l2)
      $13 ==> true
      
      jshell> l1 == l2
      $14 ==> false
      
      jshell> Float.floatToRawIntBits(l1.val())
      $15 ==> 2146959360
      
      jshell> Float.floatToRawIntBits(l2.val())
      $16 ==> 2146959361

      One other notable feature of substitutability is that it performs a "deep" comparison of nested references to other value objects. The number of comparisons is unbounded: in the following example, two deep nests of Box wrappers require a full traversal to determine whether the objects are substitutable.

      jshell> value record Box(Object val) {}
      |  created record Box
      
      jshell> var b1 = new Box(new Box(new Box(new Box(sub1))))
      b1 ==> Box[val=Box[val=Box[val=Box[val=ing]]]]
      
      jshell> var b2 = new Box(new Box(new Box(new Box(sub2))))
      b2 ==> Box[val=Box[val=Box[val=Box[val=ing]]]]
      
      jshell> b1.equals(b2)
      $20 ==> true
      
      jshell> b1 == b2
      $21 ==> false

      Constructors of value classes are constrained (see below) so that the recursive application of == to value objects will never cause an infinite loop.

      Identity-sensitive operations

      In most respects, a value object is indistiguishable from an identity object. But the following operations on objects behave differently when the object does not have identity:

      • The == and != operators, as described above, test whether two objects are substitutable, which for value objects means comparing their internal state.

        A user of a value object should never assume unique ownership of that object, because a substitutable instance might be created by someone else.

      • A new Objects.hasIdentity method returns false for value objects. Objects.requireIdentity is also available, throwing an IdentityException when given a value object.

      • The System.identityHashCode method hashes together a value object's class and its field values, ensuring that the same hash code is returned for equivalent value objects.

      • Value objects cannot be used for synchronization. Attempts to synchronize on a value class type are rejected at compile time; attempts to synchronize on a value object typed as a supertype (like Object) will fail at run time with an IdentityException.

      • Similarly, the usual understandings of object lifespan and garbage collection do not apply to value objects, because a substitutable instance may be recreated at any point. So javac produces identity warnings about uses of the java.lang.ref API at compile time, and the library throws an IdentityException at run time.

      In anticipation of these new behaviors, the value classes in the standard library have long been marked as value-based, warning that users should not depend on the unique identities of instances. Programs that have followed this advice, avoiding identity-sensitive operations on these objects, can expect consistent behavior between releases.

      Since Java 16, Warnings for Value-Based Classes has discouraged the use of synchronization (and, more recently, <code class="prettyprint" data-shared-secret="1753352346501-0.818012515550435">java.lang.ref</code>) with these value-based classes.

      Value classes and java.lang.Object

      Every value class is a subclass of java.lang.Object, just like every identity class. There is no Value superclass of all value classes.

      When applied to a value object, the default behavior of the equals, hashCode, and toString methods inherited from Object is based on the object's internal state. The Object class is unaware of the value object's external state. Specifically:

      • The default behavior of Object.equals is to perform a == test, comparing the internal state of value objects. This is often the right equals behavior for a value class. But it will sometimes be necessary for the class author to override equals in order to properly compare instances' external state, as illustrated in the previous section.

      • The default behavior of Object.hashCode is to delegate to System.identityHashCode, producing a hash for the value object's internal state. As usual, the hashCode method should be overridden by a value class whenever it overrides equals.

      • The default behavior of Object.toString is to create a string of the usual form, "ClassName@hashCode", but note that the default hashCode of a value object is derived from its internal state, not object identity. Most value class authors will want to override toString to more legibly convey the domain value represented by the object.

      In a value record, as for all records, the implicit equals, hashCode, and toString recursively apply the same operations to the record components.

      A few other methods of Object interact with value objects in interesting ways:

      • For a Cloneable value class, the Object.clone method produces a value object that is indistinguishable from the original—the usual expectation that x.clone() != x is not meaningful for value objects. Value classes that store references to identity objects may wish to override clone and perform a "deep copy" of these identity objects.

      • The wait and notify methods require that the object be locked in the current thread; since it is impossible to synchronize on a value object, attempts to call these methods will always fail with an IllegalMonitorStateException.

      • The finalize method of a value object will never be invoked by the garbage collector.

      It has always been possible to create plain instances of java.lang.Object with new Object(). This behavior is still supported, and direct instances of the Object class are always identity objects.

      Value classes and subclassing

      A value class can implement any interface. Variables with interface types can store both value objects and identity objects.

      jshell> Comparable<?> comp = 123
      comp ==> 123
      
      jshell> Objects.hasIdentity(comp)
      $2 ==> false
      
      jshell> comp = "abc"
      comp ==> "abc"
      
      jshell> Objects.hasIdentity(comp)
      $4 ==> true

      A value class cannot extend an identity class (with the exception of Object).

      A value class may be declared abstract. The value modifier applied to an abstract class indicates that the class has no need for identity, but does not restrict its subclasses. All of the usual rules for value classes apply to abstract value classes—for example, any instance fields of an abstract value class are implicitly final. (Of course, a value class that is declared abstract is not implicitly final.)

      Many abstract classes are good value class candidates. The class Number, for example, has no fields, nor any code that depends on identity-sensitive features.

      abstract value class Number implements Serializable {
          public abstract int intValue();
          public abstract long longValue();
          public byte byteValue() { return (byte) intValue(); }
          ...
      }

      Value classes and identity classes may both extend abstract value classes. For example, both Integer and BigInteger extend Number.

      jshell> Number num = 123
      num ==> 123
      
      jshell> Objects.hasIdentity(num)
      $6 ==> false
      
      jshell> num = BigInteger.valueOf(123)
      num ==> 123
      
      jshell> Objects.hasIdentity(num)
      $8 ==> true

      Abstract value classes (and interfaces) can be sealed when only a limited set of value subclasses is permitted.

      Safe construction

      Constructors initialize newly-created objects, including setting the values of the objects' fields. Because value objects do not have identity, their initialization requires special care.

      An object being constructed is "larval"—it has been created, but it is not yet fully-formed. Larval objects must be handled carefully, because the expected properties and invariants of the object may not yet hold—for example, the fields of a larval object may not be set. If a larval object is shared with outside code, that code may even observe the mutation of a final field!

      Traditionally, a constructor begins the initialization process by invoking a superclass constructor, super(...). After the superclass returns, the subclass then proceeds to set its declared instance fields and perform other initialization tasks. This pattern exposes a completely uninitialized subclass to any larval object leakage occurring in a superclass constructor.

      The Flexible Constructor Bodies feature enables an alternative approach to initialization, in which fields can be set and other code executed before the super(...) invocation. There is a two-phase initialization process: early construction before the super(...) invocation, and late construction afterwards.

      During the early construction phase, larval object leakage is impossible: the constructor may set the fields of the larval object, but may not invoke instance methods or otherwise make use of this. Fields that are initialized in the early phase are set before they can ever be read, even if a superclass leaks the larval object. Final fields, in particular, can never be observed to mutate.

      In a value class, all constructor and initializer code normally occurs in the early construction phase. This means that attempts to invoke instance methods or otherwise use this will fail:

      value class Name {
          String name;
          int length;
      
          Name(String n) {
              name = n;
              length = computeLength(); // error!
          }
      
          private int computeLength() {
              return name.length();
          }
      }

      Field that are declared with initializers get set at the start of the constructor (as usual), but any implicit super() call gets placed at the end of the constructor body.

      When a constructor includes code that needs to work with this, an explicit super(...) or this(...) call can be used to mark the transition to the late phase. But all fields must be initialized before the super(...) call, without reference to this:

      value class Name {
          String name;
          int length;
      
          Name(String n) {
              name = n;
              length = computeLength(name); // ok
              super(); // all fields must be set at this point
              System.out.println("Name: " + this);
          }
      
          // refactored to be static:
          private static int computeLength(String n) {
              return n.length();
          }
      }

      For convenience, the early construction rules are relaxed by this JEP to allow the class's fields to be read as well as written—both references to the field name in the above constructor are legal. It continues to be illegal to refer to inherited fields, invoke instance methods, or share this with other code until the late construction phase.

      Instance initializer blocks (a rarely-used feature) continue to run in the late phase, and so may not assign to value class instance fields.

      This scheme is also appropriate for identity records, so this JEP modifies the language rules for records such that their constructors always run in the early construction phase. This is not a source-compatible language change, but is not expected to be disruptive.

      In the rare case that a record constructor needs to access this, an explicit super() can be inserted, but the record's fields must be set beforehand. The following record declaration will fail to compile when preview features are enabled, because it now makes reference to this in the early construction phase.

      record Node(String label, List<Node> edges) {
         public Node {
              validateNonNull(this, label); // error!
              validateNonNull(this, edges); // error!
          }
      
          static void validateNonNull(Object o, Object val) {
              if (val == null) {
                  throw new IllegalArgumentException(
                      "null arg for " + o);
              }
          }
      }

      (Note that this attempt to provide useful diagnostics by sharing this is misguided anyway: in a record's compact constructor, the fields are not set until the end of the constructor body; before they are set, the toString result will always be Node[label=null, edges=null].)

      Finally, in normal identity classes, we think developers should write constructors and initializers that avoid the risk of larval object leakage by generally adopting the early construction constraints: read and write the declared fields of the class, but otherwise avoid any dependency on this, and where a dependency is necessary, mark it as deliberate by putting it after an explicit super(...) or this(...) call. To encourage this style, javac provides lint warnings indicating this dependencies in normal identity class constructors. (In the future, we anticipate that normal identity classes will have a way to adopt the constructor timing of value classes and records. A class that compiles without warning will likely be able to cleanly make that transition.)

      When to declare a value class

      As we've illustrated, value classes and value records are a useful tool for modeling immutable domain values that are interchangeable when they have matching internal state.

      As a general rule, if a class doesn't need identity, it should probably be a value class. This is especially true for abstract classes, which often have no need for identity—they may have no state at all—and shouldn't impose an identity requirement on their subclasses.

      But before applying the value keyword, class authors should take some additional considerations into account:

      Compatibility. Developers who maintain published identity classes should decide whether any users are likely to depend on identity. For example, these classes should have overridden equals and hashCode so that these methods' behavior is not identity-sensitive.

      Classes with public constructors are particularly at risk, because in the past users could count on the new operation producing a unique reference that the user "owned". Subsequent uses of == or synchronization may depend on that assumption of unique ownership. (Most of the value classes in the JDK have avoided this obligation by allowing object creation only through factory methods.)

      Serialization. Some classes that model domain values are likely to implement Serializable. Traditional object deserialization in the JDK does not safely initialize the class's fields, and so is incompatible with value classes. Attempts to serialize or deserialize a value object will generally fail unless it is a value record or a JDK class instance.

      Value class authors can work around this limitation with a serialization proxy, using the writeReplace and readResolve methods. (A value record may be a good candidate for a proxy class.) In the future, enhancements to the serialization mechanism are anticipated that will allow value classes to be serialized and deserialized directly.

      Subtyping. A value class must either be abstract or final. Concrete classes that have their own value object instances, while also being extended by other classes, are not supported. If a type hierarchy is needed, the supertypes in the hierarchy must be abstract value classes or interfaces.

      Sensitive state. Because the == operator and identityHashCode depend on a value object's internal state, a malicious user could use those operations to try to infer the internal state of an object. Value classes are not designed to protect sensitive data against such attacks.

      Complexity. Most domain values are simple data aggregates. Value classes are not designed for managing large recursive data structures or unusually large numbers of fields. Users who apply the value keyword to such complex classes may experience slow == operations or other performance regressions when compared to identity classes.

      Run-time optimizations for value objects

      At run time, the JVM will typically optimize the use of value object references by avoiding traditional heap object allocation as much as possible, preferring reference encodings that refer to data stored on the stack, or that embed the data in the reference itself.

      As we saw earlier, an array of LocalDate references might be flattened so that the array stores the objects' data directly. (The details of flattened encodings will vary, of course, at the discretion of the JVM implementation.)

      +--------------+
      | LocalDate[5] |
      +--------------+
      | 1|1996|01|23 |
      | 1|1996|01|23 |
      | 1|2026|01|23 |
      | 0|0000|00|00 |
      | 1|1996|01|23 |
      +--------------+

      An array of boxed Integer objects can be similarly flattened, in this case by simply concatenating a null flag to each int value.

      +--------------+
      | Integer[5]   |
      +--------------+
      | 1|1996       |
      | 1|2006       |
      | 1|1996       |
      | 1|1          |
      | 1|23         |
      +--------------+

      The layout of this array is not significantly different from that of a plain int array, except that it requires some extra bits for each null flag (in practice, this probably means that each reference takes up 64 bits).

      Flattening can be applied to fields as well—a LocalDateTime on the heap could store flattened LocalDate and LocalTime references directly in its object layout.

      +----------------------+
      | LocalDateTime        |
      +----------------------+
      | date=1|2026|01|23    |
      | time=1|09|00|00|0000 |
      +----------------------+

      Heap flattening must maintain the integrity of object data. For example, the flattened reference must be read and written atomically, or it could become corrupted. On common platforms, this limits the size of most flattened references to no more than 64 bits. So while it would theoretically be possible to flatten LocalDateTime references too, in practice they would probably be too big. In the future, 128-bit flattened encodings may be used on platforms that support atomic reads and writes of that size. And the Null-Restricted Value Types JEP will enable heap flattening for even larger value classes if the programmer is willing to opt out of atomicity guarantees.

      When these flattened references are read from heap storage, they need to be re-encoded in a form that the JVM can readily work with. One strategy is to store each field of the flattened reference in a separate local variable. This set of local variables constitutes a scalarized encoding of the value object reference.

      In HotSpot, scalarization is a JIT compilation technique, affecting the representation of references to value objects in the bodies and signatures of JIT-compiled methods.

      The following code reads a LocalDate from an array and invokes the plusYears method. The simplified contents of the plusYears method is included for reference.

      LocalDate d = arr[0];
      arr[0] = d.plusYears(30);
      
      public LocalDate plusYears(long yearsToAdd) {
          // avoid overflow:
          int newYear = YEAR.checkValidIntValue(this.year + yearsToAdd);
          // (simplification, skipping leap year adjustment)
          return new LocalDate(newYear, this.month, this.day);
      }

      In pseudo-code, the JIT-compiled code might look like the following, where the notation { ... } refers to a vector of multiple values. (Importantly, this is purely notational: there is no wrapper at run time.)

      { d_null, d_year, d_month, d_day } = $decode(arr[0]);
      arr[0] = $encode($plusYears(d_null, d_year, d_month, d_day, 30));
      
      static { boolean, int, byte, byte }
          $plusYears(boolean this_null, int this_year,
                     byte this_month, byte this_day,
                     long yearsToAdd) {
          if (this_null) throw new NullPointerException();
          int newYear = YEAR.checkValidIntValue(this_year + yearsToAdd);
          return { false, newYear, this_month, this_day };
      }

      Notice that this code never interacts with a pointer to a heap-allocated LocalDate—a flattened reference is converted to a scalarized reference, a new scalarized reference is created, and then that reference is converted to another flattened reference.

      Unlike heap flattening, scalarization is not constrained by the size of the data—local variables being operated on in the stack are not at risk of data races. A scalarized encoding of a LocalDateTime reference might consist of a null flag, four components for the LocalDate reference, and five components for the LocalTime reference.

      JVMs have used similar techniques to scalarize identity objects in local code when the JVM is able to prove that an object's identity is never used. But scalarization of value objects is more predictable and far-reaching, even across non-inlinable method invocation boundaries.

      One limitation of both heap flattening and scalarization is that it is not typically applied to a variable with a type that is a supertype of a value class type. Notably, this includes method parameters of generic code whose erased type is Object. Instead, when an assignment to a supertype occurs, a scalarized value object reference may be converted to an ordinary heap object reference. But this allocation occurs only when necessary, and as late as possible.

      Scope of changes

      Value classes and objects have a broad and deep impact on the Java Platform. This JEP includes preview language, VM, and library features, summarized as follows.

      In the Java language:

      • The value class modifier, with associated compilation rules, opts in to the semantics of value classes.

      • Safe construction rules are enforced for record classes.

      In the JVM:

      • The ACC_IDENTITY flag indicates that a class is an identity class. It is left unset for value classes and interfaces. This flag replaces the ACC_SUPER flag, which has been unused by the JVM since Java 8.

      • The LoadableDescriptors attribute lists the names of value classes appearing in the field or method descriptors of the current class. This attribute authorizes the JVM to load the named value classes early enough that it can optimize the layouts of references to instances from the current class.

      • Heap storage and JIT-compiled code are engineered to optimize the handling of value object references.

      (Additionally, compiled value classes use the features of Strict Field Initialization in the JVM (Preview) to guarantee that the class's fields are properly initialized.)

      In the Java platform API:

      • The full list of classes in the JDK that are treated as value classes when preview features is as follows:

        In java.lang: Integer, Long, Float, Double, Byte, Short, Boolean, Character, Number, and Record

        In java.util: Optional, OptionalInt, OptionalLong, and OptionalDouble

        In java.time: LocalDate, Period, Year, YearMonth, MonthDay, LocalTime, Instant, Duration, LocalDateTime, OffsetTime, OffsetDateTime, ZonedDateTime

        In java.time.chrono: HijrahDate, JapaneseDate, MinguoDate, ThaiBuddhistDate, and ChronoLocalDateImpl

      • The methods Objects.hasIdentity and Objects.requireIdentity, and the IdentityException class, are (reflective?) preview APIs.

      • The java.lang.Object, java.lang.ref, and serialization APIs are modified to give special handling to value objects.

      Future Work

      Null-Restricted Value Class Types (Preview) will build on this JEP, allowing programmers to manage the storage of nulls and enable more dense heap flattening in fields and arrays.

      Enhanced Primitive Boxing (Preview) will enhance the language's use of primitive types, taking advantage of the lighter-weight characteristics of boxing to value objects.

      JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.

      Alternatives

      As discussed, JVMs have long performed escape analysis to identify objects that never rely on identity throughout their lifespan and can be scalarized. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization, including storage in fields and arrays.

      Hand-coded optimizations via primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.

      The C language and its relatives support flattened storage for structs and similar class-like abstractions. For example, the C# language has value types. Unlike value objects, instances of these abstractions have identity, meaning they support operations such as field mutation. As a result, the semantics of copying on assignment, invocation, etc., must be carefully specified, leading to a more complex user model and less flexibility for runtime implementations. We prefer an approach that leaves these low-level details to the discretion of JVM implementations.

      Risks and Assumptions

      The feature makes significant changes to the Java object model. Developers may be surprised by, or encounter bugs due to, changes in the behavior of operations such as == and synchronized. We expect such disruptions to be rare and tractable.

      Some changes could potentially affect the performance of identity objects. The if_acmpeq test, for example, typically only costs one instruction cycle, but will now need an additional check to detect value objects. But the identity class case can be optimized as a fast path, and we believe we have minimized any performance regressions.

      There is a security risk that == and hashCode can indirectly expose private field values. Further, two large trees of value objects can take unbounded time to compute ==. Developers need to understand these risks.

      Dependencies

      Strict Field Initialization in the JVM (Preview) provides the JVM mechanism necessary to require, through verification, that value class instance fields are initialized during early construction

            dlsmith Dan Smith
            dlsmith Dan Smith
            Dan Smith Dan Smith
            Alex Buckley, Brian Goetz
            Votes:
            1 Vote for this issue
            Watchers:
            31 Start watching this issue

              Created:
              Updated: