Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8251554

JEP 401: Value Classes and Objects (Preview)

XMLWordPrintable

    • Icon: JEP JEP
    • Resolution: Unresolved
    • Icon: P3 P3
    • None
    • specification
    • None
    • Feature
    • Open
    • SE
    • valhalla dash dev at openjdk dot java dot net
    • XL
    • XL
    • 401

      Summary

      Enhance the Java Platform with value objects: class instances that have only final fields and lack object identity. This is a preview language and VM feature.

      Goals

      • Allow developers to opt in to a programming model for domain values in which objects are distinguished solely by the values of their fields, much as the int value 3 is distinguished from the int value 4.

      • Support compatible migration of existing classes that represent domain values to this programming model. Migrate suitable classes in the JDK, such as Integer and LocalDate, to be value classes.

      • Maximize the freedom of the JVM to store domain values in ways that improve memory footprint, locality, and garbage collection efficiency.

      Non-Goals

      • It is not a goal to automatically treat existing classes as value classes, even if they share some characteristics of value classes. Value objects do not uniformly work the same way as other objects, so class authors must explicitly choose to declare value classes.

      • It is not a goal to "fix" the == operator so that programmers can use it in place of equals. This JEP redefines == only as much as necessary to cope with a new kind of identity-free object. The usual advice to compare objects in most contexts using the equals method still applies.

      • It is not a goal to introduce a struct feature in the Java language. Java programmers are not asked to understand new semantics for memory management or variable storage. Java continues to operate on just two kinds of data: primitives and object references.

      • It is not a goal to change the treatment of primitive types. Primitive types behave like value classes in many ways, but are a distinct concept. A separate JEP will provide enhancements to make primitive types more class-like and compatible with generics.

      • It is not a goal to guarantee any particular optimization strategy or memory layout. This JEP enables many potential optimizations; only some will be implemented initially. Some optimizations, such as layouts that exclude null, will only be possible after future language and JVM enhancements.

      Motivation

      Java developers often need to represent domain values: the date of an event, the color of a pixel, the shipping address of an order, and so on. Developers usually model these values with immutable classes that contain just enough business logic to construct, validate, and transform instances. The toString, equals, and hashCode methods in these classes are defined so that equivalent instances can be used interchangeably.

      As an example, event dates can be represented with the JDK's <code class="prettyprint" data-shared-secret="1757189481785-0.6354758317150114">LocalDate</code> class:

      jshell> LocalDate d1 = LocalDate.of(1996, 1, 23)
      d1 ==> 1996-01-23
      
      jshell> LocalDate d2 = d1.plusYears(30)
      d2 ==> 2026-01-23
      
      jshell> LocalDate d3 = d2.minusYears(30)
      d3 ==> 1996-01-23
      
      jshell> d1.equals(d3)
      $4 ==> true

      Developers will regard the "essence" of a LocalDate object as its year, month, and day values. But to Java, the essence of any object is its identity. Each time the of method in LocalDate invokes new LocalDate(...), an object with a unique identity is allocated, distinguishable from every other object in the system.

      The easiest way to observe the identity of an object is with the == operator:

      jshell> d1 == d3
      $6 ==> false

      Even though d1 and d3 represent the same year-month-day triple (d1.equals(d3) is true), they are two objects with distinct identities.

      Domain values don't need identity

      For mutable objects, identity is important: it lets us distinguish two objects that have the same state now but will have different state in the future. For example, suppose a class Customer has a field lastOrderedDate that is mutated when the customer makes a new order. Two Customer objects might have the same lastOrderedDate, but it would be a coincidence; when one of the customers makes a new order, the application will mutate the lastOrderedDate of one Customer object but not the other, relying on identity to pick the right one.

      In other words, when objects are mutable, they are not interchangeable. But most domain values are not mutable and are interchangeable. There is no practical difference between two LocalDate objects representing 1996-01-23, because their state is fixed and unchanging. They represent the same domain value, both now and in the future. There is no need to distinguish the two objects via their identities.

      In fact, object identity is actively confusing when objects are immutable and are meant to be interchangeable. Most developers will recall the experience of unwittingly using == to compare objects, as in d1 == d3 above, and being mystified by a false result even though the objects' state and behavior seem identical.

      The JDK tries to reduce confusion for the immutable classes that model primitive values, such as Integer. In particular, the autoboxing of small int values to Integer uses a cache to avoid creating Integer objects with unique identities. However, this cache, somewhat arbitrarily, does not extend to four-digit int values like 1996:

      jshell> Integer i = 96, j = 96;
      i ==> 96
      j ==> 96
      
      jshell> i == j
      $3 ==> true
      
      jshell> Integer x = 1996, y = 1996;
      x ==> 1996
      y ==> 1996
      
      jshell> x == y
      $6 ==> false

      For domain values like Integer, the fact that each object has unique identity is unwanted complexity that leads to surprising behavior and exposes incidental implementation choices. This extra complexity could be avoided if objects whose state and behavior make them interchangeable could be freed from the legacy requirement to have distinct identities.

      Object identity is expensive at run time

      Java's requirement that every object have identity, even if some domain values don't want it, is a performance impediment. It means the JVM has to allocate memory for each newly created object, distinguishing it from every object already in the system, and reference the location in memory whenever the object is used or stored.

      For example, suppose a program creates arrays of int values and LocalDate references:

      jshell> int[] ints = { 1996, 2006, 1996, 1, 23 }
      ints ==> int[5] { 1996, 2006, 1996, 1, 23 }
      
      jshell> LocalDate[] dates = { d1, d1, d2, null, d3 }
      dates ==> LocalDate[5] { 1996-01-23, 1996-01-23, 2026-01-23,
                               null, 1996-01-23 }

      The int array can be allocated by the JVM as a simple block of memory:

      +----------+
      | int[5]   |
      +----------+
      | 1996     |
      | 2006     |
      | 1996     |
      | 1        |
      | 23       |
      +----------+

      In contrast, the LocalDate array must be represented as a sequence of pointers, each referencing a location in memory where an object has been allocated:

      +--------------+
      | LocalDate[5] |
      +--------------+
      | 87fa1a09     | -----------------------> +-----------+
      | 87fa1a09     | -----------------------> | LocalDate |
      | 87fb4ad2     | ------> +-----------+    +-----------+
      | 00000000     |         | LocalDate |    | y=1996    |
      | 87fb5366     | ---     +-----------+    | m=1       |
      +--------------+   |     | y=2026    |    | d=23      |
                         v     | m=1       |    +-----------+
              +-----------+    | d=23      |
              | LocalDate |    +-----------+
              +-----------+
              | y=1996    |
              | m=1       |
              | d=23      |
              +-----------+

      Even though the data modeled by the LocalDate array is not significantly more complex than the int array—a year-month-day triple is effectively 48 bits of primitive data—the memory footprint is far greater because of the pointers and allocated objects. dates[4] has to point to a different object than dates[0] and dates[1], even though all three elements represent the same year-month-day triple.

      Worse, when a program iterates over the LocalDate array, each pointer may need to be dereferenced. CPUs use caches to enable fast access to chunks of memory; if the array exhibits poor memory locality (a distinct possibility if the LocalDate objects were allocated at different times or out of order), every dereference may require caching a different chunk of memory, frustrating performance.

      In some application domains, developers program for speed by creating as few objects as possible, thus de-stressing the garbage collector and improving locality. For example, they might encode event dates with an int representing an epoch day. Unfortunately, this approach gives up the functionality of classes that makes Java code so maintainable: meaningful names, private state, data validation by constructors, convenience methods, etc. A developer operating on dates represented as int values might accidentally interpret the value relative to a start date in 1601 or 1980 rather than the intended 1970 start date.

      Programming without identity

      Trillions of Java objects are created every day, each one bearing a unique identity. We believe the time has come to let Java developers choose which objects in the program need identity, and which do not. An immutable class like LocalDate that represents domain values could opt out of identity, so that it would be impossible to distinguish between two LocalDate objects representing the date 1996-01-23, just as it is impossible to distinguish between two int values representing the number 4.

      By opting out of identity, developers are opting in to a programming model that provides the best of both worlds: the abstraction of classes with the simplicity and performance benefits of primitives.

      In the future, this programming model will support new Java Platform APIs, such as classes that encode different kinds of integers and floating-point values, and new Java language features, such as user-defined conversions and mathematical operators for domain values.

      Description

      Java NN introduces value objects to model immutable domain values. A value object is an instance of a value class, declared with the value modifier. Classes without the value modifier are called identity classes, and their instances are identity objects.

      Java programs manipulate objects through references. A reference to an object is stored in a variable and lets us find the object's fields. Traditionally, a reference also encodes the unique identity of an object: each execution of new allocates a fresh object and returns a unique reference, which can then be stored in multiple variables (aliasing). And, traditionally, the == operator compares objects by comparing references, so references to two objects are not == even if the objects have identical field values.

      In Java NN, value objects are different. A reference to a value object is stored in a variable and lets us find the object's fields, but it does not serve as the unique identity of the object. For a value class, executing new might not allocate a fresh object and might instead return a reference to an existing object, or even a "reference" that embodies the object directly. The == operator compares value objects by comparing their field values, so references to two objects are == if the objects have identical field values.

      Developers can save memory and improve performance by using value objects for immutable data. Because programs cannot tell the difference between two value objects with identical field values (not even with ==), the Java Virtual Machine is able to change how a value object is laid out in memory without affecting the program; for example, its fields could be stored on the stack rather than the heap.

      The following sections explore how value objects differ from identity objects and illustrate how to declare value classes. This is followed by an in-depth treatment of the special behaviors of value objects, considerations for value class declarations, and the JVM's handling of value classes and objects.

      Enabling preview features

      Value classes and objects are a preview language feature, disabled by default.

      To try the examples below in JDK NN you must enable preview features:

      • Compile the program with javac --release NN --enable-preview Main.java and run it with java --enable-preview Main; or,

      • When using the source code launcher, run the program with java --enable-preview Main.java; or,

      • When using jshell, start it with jshell --enable-preview.

      Some classes in the Java Platform API become value classes only if preview features are enabled; otherwise, they behave just as they did in JDK NN-1.

      Programming with value objects

      30 classes in java.* are declared as value classes. They include:

      • In java.lang: Integer, Long, Float, Double, Byte, Short, Character, Boolean
      • In java.util: Optional, OptionalInt, OptionalLong, OptionalDouble
      • In java.time: LocalDate, LocalTime, LocalDateTime, ZonedDateTime, Duration

      All instances of these classes are value objects. This includes the boxed primitives that are instances of Integer, Long, etc. The == operator compares value objects by their field values, so, e.g., Integer objects are == if they box the same primitive values:

      % -> jshell --enable-preview
      |  Welcome to JShell -- Version 25-internal
      |  For an introduction type: /help intro
      
      jshell> Integer x = 1996, y = 1996;
      x ==> 1996
      y ==> 1996
      
      jshell> x == y
      $3 ==> true

      Similarly, two LocalDate objects are == if they have the same year, month, and day values:

      jshell> LocalDate d1 = LocalDate.of(1996, 1, 23)
      d1 ==> 1996-01-23
      
      jshell> LocalDate d2 = d1.plusYears(30)
      d2 ==> 2026-01-23
      
      jshell> LocalDate d3 = d2.minusYears(30)
      d3 ==> 1996-01-23
      
      jshell> d1 == d3
      $7 ==> true

      The String class has not been made a value class. Instances of String are always identity objects. We can use the Objects.hasIdentity method, new in JDK NN, to observe whether an object is an identity object.

      jshell> String s = "abcd"
      s ==> "abcd"
      
      jshell> Objects.hasIdentity(s)
      $9 ==> true
      
      jshell> Objects.hasIdentity(d1)
      $10 ==> false
      
      jshell> String t = "aabcd".substring(1)
      t ==> "abcd"
      
      jshell> s == t
      $13 ==> false

      In most respects, value objects work the way that objects have always worked in Java. However, a few identity-sensitive operations, such as synchronization, are not supported by value objects.

      jshell> synchronized (d1) { d1.notify(); }
      |  Error:
      |  unexpected type
      |    required: a type with identity
      |    found:    java.time.LocalDate
      |  synchronized (d1) { d1.notify(); }
      |  ^--------------------------------^
      
      jshell> Object o = d1
      o ==> 1996-01-23
      
      jshell> synchronized (o) { o.notify(); }
      |  Exception java.lang.IdentityException: Cannot synchronize on
         an instance of value class java.time.LocalDate
      |        at (#19:1)

      The JVM has a lot of freedom to encode references to value objects at run time in ways that optimize memory footprint, locality, and garbage collection efficiency. For example, we saw the following array earlier, implemented with pointers to heap objects:

      jshell> LocalDate[] dates = { d1, d1, d2, null, d3 }
      dates ==> LocalDate[5] { 1996-01-23, 1996-01-23, 2026-01-23,
                               null, 1996-01-23 }

      Now that LocalDate objects lack identity, the JVM could implement the array using "references" that encode the fields of each LocalDate directly. Each array element can be represented as a 64-bit word that indicates whether the reference is null, and if not, directly stores the year, month, and day field values of the value object:

      +--------------+
      | LocalDate[5] |
      +--------------+
      | 1|1996|01|23 |
      | 1|1996|01|23 |
      | 1|2026|01|23 |
      | 0|0000|00|00 |
      | 1|1996|01|23 |
      +--------------+

      The performance characteristics of this LocalDate array may be similar to those of an ordinary int array:

      +----------+
      | int[5]   |
      +----------+
      | 1996     |
      | 2006     |
      | 1996     |
      | 1        |
      | 23       |
      +----------+

      This optimization is just one example; some value classes, like LocalDateTime, are too large to take advantage of this particular technique. Still, the lack of identity enables the JVM to optimize references to value objects in many ways.

      Declaring value classes

      Developers can declare their own value classes by applying the value modifier to any class whose instances should be immutable and interchangeable:

      • Immutable: All instance fields of the class should be final, and the domain value represented by an instance will not change over time; and

      • Interchangeable: It's not necessary to distinguish between two separately-created instances that represent the same domain value

      When the value modifier is applied to a class, its fields are implicitly final. The class is also implicitly final, so cannot be extended. Because the class is final, its methods cannot be overridden.

      There is no restriction on the types of fields in a value class. The fields may store references to other value objects, or to identity objects, e.g., strings.

      Record classes are final and all their fields are final, so they are often good candidates to be value classes.

      jshell> value record Point(int x, int y) {}
      |  created record Point
      
      jshell> Point p = new Point(17, 3)
      p ==> Point[x=17, y=3]
      
      jshell> Objects.hasIdentity(p)
      $7 ==> false
      
      jshell> new Point(17, 3) == p
      $8 ==> true

      Many classes represent immutable and interchangeable domain values but cannot be record classes because they are not transparent. A record is transparent because the fields it uses to represent a domain value are the same as the constructor arguments used to create the domain value. Most classes, however, use private fields to represent a domain value internally in a more efficient way than is exposed externally through public methods. For example, a class might represent a quantity of euros and cents with a single int field to save memory; it cannot be a record class, but it can still be a value class.

      value class EURCurrency {
          private int cs;  // implicitly final
          private EURCurrency(int cs) { this.cs = cs; }
      
          public EURCurrency(int euros, int cents) {
              this(euros * 100 + (euros < 0 ? -cents : cents));
          }
      
          public int euros() { return cs/100; }
          public int cents() { return Math.abs(cs%100); }
          public String toString() {
              return "€%d,%d".formatted(euros(), cents());
          }
      }

      Comparing value objects

      The purpose of the == operator in Java is to test whether two referenced objects are indistinguishable. If two references are ==, the JVM can freely replace one object with the other, and no code will be able to tell the difference.

      For identity objects, the == operator works the same in JDK NN as in 1.0: it checks whether two references are to the same object, at the same location in memory.

      For value objects, the == operator checks for statewise equivalence. This means the two references are to objects with the same field values. Two value objects are statewise equivalent if:

      • They are instances of the same value class;

      • Their primitive-typed fields store the same bit patterns; and

      • Their reference-typed fields are ==: either two null references, or two references to the same identity object, or two references to statewise-equivalent value objects.

      == and equals will often produce the same results for value objects. However, for some value classes, instances may be interchangeable (so equals) even if their field values are different (so not ==). Developers who want to test whether two value objects represent the same domain value should use the equals method, and class authors should define equals in a way that always returns true for interchangeable domain values.

      An example where == and equals may differ for value objects involves the LazySubstring value class below. It represents a substring of a string lazily, without allocating a new char[] in memory. The internal state of a LazySubstring instance is a source string and two coordinates, while the domain value represented by the instance is a character sequence produced by toString. Accordingly, two instances may model the same character sequence (so are equals) even though their internal state is different (so not ==).

      value class LazySubstring {
          private String str;
          private int start, end;
      
          public LazySubstring(String s, int i, int j) {
              str = s; start = i; end = j;
          }
      
          public String toString() {
              return str.substring(start, end);
          }
      
          public boolean equals(Object o) {
              return o instanceof LazySubstring &&
                  toString().equals(o.toString());
          }
      
          public int hashCode() {
              return Objects.hash(LazySubstring.class, toString());
          }
      }
      
      jshell> LazySubstring sub1 = new LazySubstring("ringing", 1, 4);
      sub1 ==> ing
      
      jshell> LazySubstring sub2 = new LazySubstring("ringing", 4, 7);
      sub2 ==> ing
      
      jshell> sub1.equals(sub2)
      $3 ==> true
      
      jshell> sub1 == sub2
      $4 ==> false

      The results of == and equals may also be different if two value objects' fields refer to two identity objects that are interchangeable according to equals, but that have different identities.

      jshell> String r = "bringing".substring(1);
      r ==> ringing
      
      jshell> r == "ringing"
      $6 ==> false
      
      jshell> LazySubstring sub3 = new LazySubstring(r, 1, 4);
      sub3 ==> ing
      
      jshell> sub1.equals(sub3)
      $8 ==> true
      
      jshell> sub1 == sub3  // tests sub1.str == sub3.str
      $9 ==> false

      Another situation where == and equals may differ is where value objects have float or double fields. The primitive floating-point types support multiple encodings of NaN using different bit patterns. These NaN values are treated as interchangeable by most floating-point operations, but because each bit pattern is distinct, value objects that wrap different encodings of NaN are not statewise equivalent according to ==. The value class author must decide whether the distinction is meaningful for the equals method. For example, the default behavior of equals in a value record class does not consider NaN encodings to be a meaningful distinction.

      jshell> value record Length(float val) {}
      |  created record Length
      
      jshell> Length l1 = new Length(Float.intBitsToFloat(0x7ff80000))
      l1 ==> Length[val=NaN]
      
      jshell> Length l2 = new Length(Float.intBitsToFloat(0x7ff80001))
      l2 ==> Length[val=NaN]
      
      jshell> l1.equals(l2)
      $13 ==> true
      
      jshell> l1 == l2
      $14 ==> false
      
      jshell> Float.floatToRawIntBits(l1.val())
      $15 ==> 2146959360
      
      jshell> Float.floatToRawIntBits(l2.val())
      $16 ==> 2146959361

      Note that == performs a "deep" comparison of nested references to other value objects. The number of comparisons is unbounded. In the following example, two deep nests of Box objects require a full traversal to determine whether the objects are statewise equivalent.

      jshell> value record Box(Object val) {}
      |  created record Box
      
      jshell> var b1 = new Box(new Box(new Box(new Box(sub1))))
      b1 ==> Box[val=Box[val=Box[val=Box[val=ing]]]]
      
      jshell> var b2 = new Box(new Box(new Box(new Box(sub2))))
      b2 ==> Box[val=Box[val=Box[val=Box[val=ing]]]]
      
      jshell> b1.equals(b2)
      $20 ==> true
      
      jshell> b1 == b2
      $21 ==> false

      Constructors of value classes are constrained (discussed later) so that the recursive application of == to value objects will never cause an infinite loop.

      Value classes and subclassing

      Every value class belongs to a class hierarchy with java.lang.Object at its root, just like every identity class. There is no java.lang.Value superclass of all value classes.

      All value classes are subclasses of java.lang.Object and can implement interfaces. This means variables declared with Object, or with interfaces, can store references to both value objects and identity objects.

      jshell> Object o = LocalDate.of(1996, 1, 23)
      o ==> 1996-01-23
      
      jshell> Objects.hasIdentity(o)
      $2 ==> false
      
      jshell> Comparable<?> comp = 123
      comp ==> 123
      
      jshell> Objects.hasIdentity(comp)
      $2 ==> false
      
      jshell> comp = "abc"
      comp ==> "abc"
      
      jshell> Objects.hasIdentity(comp)
      $4 ==> true

      By default, a value class is implicitly final and cannot be extended. However, a value class may be declared abstract, allowing it to be extended by other classes and have its methods overridden. Methods in an abstract value class may be marked abstract, as in an abstract identity class.

      The subclasses of an abstract value class may be value classes or identity classes. Thus, a value class can extend either java.lang.Object or an abstract value class.

      The fields of an abstract value class are implicitly final, as in a concrete value class.

      Many existing abstract classes are good candidates to be abstract value classes. Applying the value modifier to an abstract class indicates that the class has no need for identity but does not restrict subclasses from having identity. For example, the abstract class Number has no fields, nor any code that depends on identity-sensitive features, so it can be safely migrated to an abstract value class.

      abstract value class Number implements Serializable {
          public abstract int intValue();
          public abstract long longValue();
          public byte byteValue() { return (byte) intValue(); }
          ...
      }

      Integer (a value class) and java.math.BigInteger (an identity class) both extend Number.

      jshell> Number num = 123
      num ==> 123
      
      jshell> Objects.hasIdentity(num)
      $6 ==> false
      
      jshell> num = BigInteger.valueOf(123)
      num ==> 123
      
      jshell> Objects.hasIdentity(num)
      $8 ==> true

      An abstract value class can be sealed to limit who can extend the class.

      sealed abstract value class UserID
              permits EmailID, PhoneID, UsernameID {
          ...
      }
      
      value record EmailID(String name, String domain) { ... }
      value record PhoneID(String digits) { ... }
      value record UsernameID(String name) { ... }

      Safe construction for value classes

      Constructors initialize newly-created objects by setting the values of their fields. Because value objects do not have identity, their initialization requires special care.

      An object being constructed is "larval"—it has been created but is not yet fully-formed. Larval objects must be handled carefully: if a larval object is shared with code outside the constructor, then domain-specific properties of the object may not yet hold, and the code may even observe the mutation of final fields.

      Traditionally, a constructor begins the initialization process by invoking a superclass constructor, super(...). If this is not done explicitly, then the Java compiler inserts a super() call at the beginning of the constructor body. After the superclass returns, the subclass proceeds to set its declared instance fields and perform other initialization tasks. This pattern exposes a completely uninitialized subclass to any larval object leakage that occurs in a superclass constructor.

      Flexible Constructor Bodies in Java 25 enables safer initialization whereby fields can be set and other code executed before the super(...) invocation. There is a two-phase initialization process: early construction before the super(...) invocation, and late construction afterwards.

      During the early construction phase, larval object leakage is impossible: the constructor may set the fields of the larval object, but may not invoke instance methods or otherwise make use of this. Fields that are initialized in the early construction phase are therefore set before they can ever be read, even if a superclass leaks the larval object. Final fields, in particular, can never be observed to mutate.

      In a value class, by default, all constructor code occurs in the early construction phase. The Java compiler inserts a super() call at the end of the constructor body, not the beginning. Attempts to invoke instance methods or otherwise use this will fail:

      value class Name {
          String name;
          int length;
      
          Name(String n) {
              name = n;
              length = strLength();  // Error, invokes this.strLength()
          }
      
          private int strLength() {
              return name.length();
          }
      }

      Instance fields that are declared with initializer expressions are set at the start of the constructor, in the early construction phrase.

      Instance initializer blocks (a rarely-used feature) are run in the late construction phase, so they cannot set instance fields in value classes.

      When a constructor has code that needs to work with this, an explicit super(...) or this(...) call can be used to mark the transition from the early to the late construction phase. All fields must be initialized before the call, and without referring to this:

      value class Name {
          String name;
          int length;
      
          Name(String n) {
              name = n;
              length = strLength(name);  // OK, strLength is now static
              super();  // All fields must be set at this point
              System.out.println("Name: " + this);
          }
      
          private static int strLength(String n) {
              return n.length();
          }
      }

      In Java 25, the fields in an identity class may only be set in the early construction phase, not read. For convenience, in Java NN, the fields in an identity class or a value class may be read in the early construction phase after they have been set. As a result, both references to name in the constructor above are legal. It continues to be illegal in Java NN to refer to inherited fields, invoke instance methods, or share this with other code until the late construction phase.

      Safe construction for identity classes

      In identity classes, we believe developers should write constructors and field initializers that avoid the risk of larval object leakage by adopting early construction constraints: read and write the declared fields of the class, but otherwise avoid any dependency on this, and where a dependency is necessary, mark it as deliberate by putting it after an explicit super(...) or this(...) call.

      To encourage this style, javac in JDK NN generates lint warnings that indicate this dependencies in constructors of identity classes. In the future, we anticipate that identity classes will have a way to adopt the constructor timing of value classes. A class that compiles without the lint warnings will likely be able to make the transition cleanly.

      Further, in Java NN, identity record classes behave the same as value record classes: their constructors always run in the early construction phase. This change is not source compatible, but based on a survey of existing record class declarations, it is not expected to be disruptive.

      As an example, the following record class will fail to compile because its canonical constructor refers to this in the early construction phase:

      record Node(String label, List<Node> edges) {
         public Node {
              nullCheck(label, this);  // OK in Java 25, error in Java NN
              nullCheck(edges, this);  // OK in Java 25, error in Java NN
          }
      
          static void nullCheck(Object arg, Object owner) {
              if (arg == null) {
                  String msg = "null arg for " + owner.toString();
                  throw new IllegalArgumentException(msg);
              }
          }
      }

      In cases where a record constructor needs to access this, an explicit super() can be inserted, but the record's fields must be set explicitly beforehand.

      Inherited methods of java.lang.Object

      Like any class, a value class inherits methods like equals, hashCode, and toString from java.lang.Object, unless the class author chooses to override them. These methods traditionally depend on identity, but when operating on a value object, they use the values of the object's fields instead. Specifically:

      • The inherited implementation of Object.equals uses == to compare objects. For value objects, this tests for statewise equivalence. This might be the right <code class="prettyprint" data-shared-secret="1757189481785-0.6354758317150114">equals</code> behavior for a value class, but if it isn't then the class author should override equals.

      • The inherited implementation of Object.hashCode computes a hash from the object's field values. (This value can also be computed via System.identityHashCode.) As usual, the hashCode method should be overridden by a value class whenever it overrides equals.

      • The inherited implementation of Object.toString returns a string of the form "ClassName@hashCode". Since value classes represent immutable domain values, most value class authors will want to override toString to more legibly convey the domain value represented by the object.

      In a value record, as for all records, the default equals, hashCode, and toString behavior is to recursively apply the same operations to the record components.

      A few other methods of Object interact with value objects:

      • For a Cloneable value class, the Object.clone method produces a value object that is indistinguishable from the original—the usual expectation that x.clone() != x is not meaningful for value objects. Value classes that store references to identity objects may wish to override clone and perform a "deep copy" of these identity objects.

      • The wait and notify methods require that the object be locked in the current thread; since it is impossible to synchronize on a value object, attempts to call these methods will always fail with an IllegalMonitorStateException.

      • The finalize method of a value object will never be invoked by the garbage collector.

      Migrating to value classes

      Value classes, and especially value records, are useful tools for modeling immutable domain values that are interchangeable when two instances represent the same value.

      As a general rule, if a class with immutable state doesn't need identity, it should be made a value class. This includes abstract classes, which often have no state at all and shouldn't impose an identity requirement on their subclasses.

      For final and abstract classes with only final fields, applying or removing the value keyword is a binary-compatible change.

      However, migrating from an identity class to a value class carries some risks of source and behavioral incompatibility that class authors should consider:

      • If the class has public constructors, users may have relied on them to create objects that are known to be distinguishable from every other object via ==. Changing the class to be a value class will invalidate that logic, possibly leading to run-time bugs.

        If this incompatibility is a serious concern, it may be appropriate to deprecate the public constructors and encourage use of factory methods instead. As an example, in Java 25, the constructors of Integer, Float, etc., are deprecated, and factory methods such as Integer.valueOf are recommended instead.

      • If users are synchronizing on instances of the class, then after migration their code will fail, either with a compile-time error or an IdentityException at run time. This incompatibility is more likely to be a risk for classes with public constructors, because users will generally want to be sure they "own" the object being used for locking.

      • If the equals and hashCode methods have not already been overridden, they will behave differently after migration. A good migration candidate will want to override these methods beforehand so that their behavior does not depend on identity.

      • If the class encapsulates sensitive state, class authors should be cautious about the risk of exposing that state through == or System.identityHashCode: a malicious user could use those operations to try to infer the internal state of an instance. Value classes are not designed to protect sensitive data against such attacks.

      Run-time optimizations for value objects

      At run time, the JVM can optimize value objects by encoding them in more compact forms than identity objects. Instead of allocating space in the heap for a value object, the JVM can flatten and scalarize the object.

      • Heap flattening: When a field of one object, or an element of an array, stores a reference to another object, the JVM can encode the other object's field values into the reference directly. When this happens, the reference is not a pointer to the other object in memory. The other object is said to be flattened.

      • Scalarization: When a method parameter or local variable stores a reference to an object, the JVM can encode the object's field values into additional local variables. When this happens, again, the reference is not a pointer to an object in memory. The object is said to be scalarized.

      When an object is flattened or scalarized, it has no independent presence in the heap. This means it has no impact on garbage collection, and its data is always co-located in memory with the referencing object or call stack.

      Heap flattening

      As an example, the JVM could flatten an array of Integer references so that each array element holds a reference that encodes the underlying integer value directly, rather than pointing to the memory location of some Integer object. Each reference also flags whether the original Integer reference was null by prepending 0 (null) or 1 (non-null) to the integer value.

      +--------------+
      | Integer[5]   |
      +--------------+
      | 1|1996       |
      | 1|2006       |
      | 1|1996       |
      | 0|0          |
      | 0|0          |
      +--------------+

      Each int value takes up 32 bits, and each null flag requires at least one additional bit. Due to hardware constraints, the JVM will probably encode each flattened Integer reference as a 64-bit unit. An Integer array thus has a larger memory footprint than a plain int array, but a significantly smaller total footprint than an array of pointers to objects (the pointer itself is a 32- or 64-bit value, and each referenced object requires at least 64 bits just for its header). Even more significantly, all of the Integer data is stored directly inside the array, and can be processed without any extra memory loads.

      As shown earlier, an array of LocalDate references can be flattened by prepending a null flag to the year-month-day triple of a LocalDate object (an int and two bytes). Like flattened Integer references, these flattened LocalDate references can fit in 64 bits.

      +--------------+
      | LocalDate[5] |
      +--------------+
      | 1|1996|01|23 |
      | 1|1996|01|23 |
      | 1|2026|01|23 |
      | 0|0000|00|00 |
      | 1|1996|01|23 |
      +--------------+

      Fields may also store flattened references. For example, a LocalDateTime object has two fields (a LocalDate and a LocalTime) and both can store a flattened reference.

      +----------------------+
      | LocalDateTime        |
      +----------------------+
      | date=1|2026|01|23    |
      | time=1|09|00|00|0000 |
      +----------------------+

      Heap flattening must maintain the integrity of data. A flattened reference must always be read and written atomically, or it could become corrupted. On common platforms, this limits the size of most flattened references to no more than 64 bits. For example, a flattened reference to a LocalDateTime object would embed fields from the underlying LocalDate and LocalTime, plus a null flag for each, plus a null flag for the LocalDateTime itself. The flattened reference is likely too big to read and write atomically, so it cannot be stored in a field of type LocalDateTime, e.g., the timestamp of an Event:

      +------------------------------------------+
      | Event                                    |
      +------------------------------------------+
      | timestamp=1|1|2026|01|23|1|09|00|00|0000 |  // Not possible
      | ...                                      |
      +------------------------------------------+

      Instead, the JVM stores a pointer to a LocalDateTime object, whose own fields may store flattened references as shown earlier:

      +--------------------+
      | Event              |
      +--------------------+
      | timestamp=87fa50a0 |------> +----------------------+
      | ...                |        | LocalDateTime        |
      +--------------------+        +----------------------+
                                    | date=1|2026|01|23    |
                                    | time=1|09|00|00|0000 |
                                    +----------------------+

      In the future, 128-bit flattened references may be possible on platforms that support atomic reads and writes of that size, or in special cases like final fields.

      Scalarization

      When the JVM sees a flattened reference in the field of an object in the heap, it needs to re-encode the reference in a form that it can readily work with. For code compiled by the JVM's just-in-time (JIT) compiler, this encoding can be a scalarized reference.

      For example, consider the following code which reads a LocalDate from an array and invokes plusYears. A simplified version of plusYears is shown for reference.

      LocalDate d = dates[0];
      dates[0] = d.plusYears(30);
      ...
      public LocalDate plusYears(long yearsToAdd) {
          int newYear = YEAR.checkValidIntValue(this.year + yearsToAdd);
          return new LocalDate(newYear, this.month, this.day);
      }

      In pseudo-code, the result of JIT compilation might look like the following, using the notation { ... } to indicate that multiple values are returned from a JIT-compiled method. (This is purely notational; there is no wrapper at run time.)

      { d_null, d_year, d_month, d_day } = $decode(dates[0]);
      dates[0] = $encode($plusYears(d_null, d_year, d_month, d_day, 30));
      
      static { boolean, int, byte, byte }
          $plusYears(boolean this_null, int this_year,
                     byte this_month, byte this_day,
                     long yearsToAdd) {
          if (this_null) throw new NullPointerException();
          int newYear = YEAR.checkValidIntValue(this_year + yearsToAdd);
          return { false, newYear, this_month, this_day };
      }

      Thanks to the JVM's optimizations, this code never touches a pointer to a heap-allocated LocalDate:

      • A flattened reference in dates[0] is converted to a scalarized reference by $decode(...)

      • A new scalarized reference is returned from plusYears

      • That reference is converted to another flattened reference by $encode(...)

      Unlike heap flattening, scalarization is not constrained by the size of the data. Local variables that are pushed and popped on the stack are not at risk of data races. Therefore, it is possible to have a scalarized encoding of a LocalDateTime reference: three values and a null flag for the underlying LocalDate, four values and a null flag for the underlying LocalTime, and a null flag for the LocalDateTime itself.

      JVMs have used similar techniques to scalarize identity objects in methods when the JVM is able to prove that an object's identity is never used. Scalarization of value objects is more predictable and far-reaching, even across method boundaries.

      When flattening and scalarization can occur

      Heap flattening and scalarization are optimizations, not language features. Programmers cannot directly control them. Like all optimizations, they occur at the discretion of the JVM. However, there are things programmers can do to make it more likely that the JVM can apply these optimizations.

      First, heap flattening and scalarization rely on the JVM's knowledge that a variable only stores a specific value class: the date of a LocalDateTime is always a LocalDate reference. Flattening and scalarization cannot typically be applied to a variable declared with a supertype of a value class, such as Object.

      For example, the following two arrays store the same Integer values when they are created, but because the second needs to be able to store arbitrary Object references in the future, it has to encode its elements as pointers to regular objects on the heap.

      Integer[] ints = { 1996,2006,1996,null,null };  // flattenable
      Object[] objs = { 1996,2006,1996,null,null };  // not flattenable

      Future value objects written to the objs array will need to be converted to a regular heap object encoding.

      Integer i = -1;
      ints[3] = i;  // write a flattened reference
      objs[3] = i;  // write a heap pointer

      A field with a generic type T usually has erased type Object, and so will behave at runtime just like an Object-typed field.

      record Box<T>(T field) {}  // field is not flattenable
      var b = new Box<Integer>(i);  // field stores a heap pointer

      These conversions between encodings do not have any semantic impact—the Integer objects referenced by objs and field are still value objects, and do not have identity. The JVM is simply encoding the same value object in different ways.

      The same principles apply to method parameters: a parameter with type LocalDate is reliably scalarizable, while a parameter with type Object or T is not. (However, if the method call can be inlined, the JIT may be able to skip the assignment and heap allocation completely.)

      A second factor that influences whether the JVM applies flattening and scalarization is the contents of a class file that uses value classes. When a class is compiled, the names of value classes mentioned by its field and method signatures get recorded in a new LoadableDescriptors class file attribute. This attribute authorizes the JVM to load the named value classes early enough to set up flattened fields and scalarized method parameters.

      If a value class is not listed by LoadableDescriptors, then when the referencing class is loaded, the JVM may not know that it is a value class. A field of that type may be laid out like any other field, storing regular object pointers instead of flattened references. A method with a parameter of that type may not be set up to accept scalarized calls, forcing callers to pass regular object pointers.

      In practice, this means classes that depend on migrated value classes will perform the best if the updated value class declaration was available at run time. If the class was an identity class at compile time, it will get left out of LoadableDescriptors, and the JVM may not be able to flatten the referencing class's fields or scalarize its method signatures.

      Value classes and the Java Platform

      The Java Platform API supports value classes and value objects in the following ways:

      • 30 classes in java.* are declared as value classes.

        In java.lang: Integer, Long, Float, Double, Byte, Short, Character, Boolean, and the abstract classes Number and Record

        In java.util: Optional, OptionalInt, OptionalLong, OptionalDouble

        In java.time: Duration, Instant, LocalTime, Year, YearMonth, MonthDay, Period, LocalDate, LocalDateTime, OffsetTime, OffsetDateTime, ZonedDateTime

        In java.time.chrono: MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate

        To minimize compatibility risks, these classes have long discouraged reliance on the identities of instances, and have been documented as value-based. They have also prevented or discouraged instance creation through constructors. Since Java 16, Warnings for Value-Based Classes have discouraged the use of synchronization with these classes.

      • The vast majority of Platform APIs work seamlessly with value objects. Methods that operate on Object or Object[] parameters accept value objects. Almost anywhere a user needs to provide an implementation of an interface, the implementation may be a value class. Generic APIs such as List<T> and Comparable<T> can be parameterized with value classes as the type arguments.

      • New methods in java.util.Objects (hasIdentity, requireIdentity) allow developers to distinguish between identity objects and value objects.

      • A new constant in java.lang.reflect.AccessFlag exposes whether a class is an identity class or a value class.

        Whether a class is an identity class or a value class is recorded in its class file. Identity classes have the ACC_IDENTITY flag set; value classes do not. This flag supersedes the legacy ACC_SUPER flag. The JVM Specification always recommended that compilers and tools set the ACC_SUPER flag in class files, so by default, compilers and tools can continue to set the flag in new class files and generate identity classes.

      • Serialization works with value records out of the box, but serialization of non-record value classes requires developer attention. Namely, value classes that implement Serializable must implement the <code class="prettyprint" data-shared-secret="1757189481785-0.6354758317150114">writeReplace</code> and <code class="prettyprint" data-shared-secret="1757189481785-0.6354758317150114">readResolve</code> methods. This causes a replacement object to be serialized and deserialized instead of the value object. If these methods are not implemented, attempts to serialize or deserialize the value object will fail with an InvalidClassException.

        These methods must be implemented because value classes are compiled using strictly-initialized fields, and deserialization does not safely initialize these fields. Value objects may only be created, and their fields initialized, by invoking a constructor. In the future, enhancements to the serialization mechanism are anticipated that will allow a Serializable value class to be serialized and deserialized automatically.

      • Deep reflection on value objects is not possible. Libraries that modify final fields via Field.setAccessible are incompatible with safe construction and will not be able to modify value class fields, even if <code class="prettyprint" data-shared-secret="1757189481785-0.6354758317150114">--enable-final-field-mutation</code> is used on the command line. Libraries must initialize instances of a value class using the class's constructors.

      • The garbage collection APIs in java.lang.ref and java.util.WeakHashMap do not allow developers to manually manage value objects in the heap. Attempts to create Reference objects for value objects throw IdentityException at run time. javac produces identity warnings about uses of the API with value classes at compile time.

        Since JDK 25, javac has produced <code class="prettyprint" data-shared-secret="1757189481785-0.6354758317150114">identity</code> warnings about value-based classes being used with these APIs.

      Future Work

      Null-Restricted Value Class Types (Preview) will build on this JEP, allowing programmers to manage the storage of nulls and enable more dense heap flattening in fields and arrays.

      Enhanced Primitive Boxing (Preview) will enhance the language's use of primitive types, taking advantage of the lighter-weight characteristics of boxing to value objects.

      JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.

      Alternatives

      As discussed, JVMs have long performed escape analysis to identify objects that never rely on identity throughout their lifespan and can be scalarized. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization, including storage in fields and arrays.

      Hand-coded optimizations via primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.

      The C language and its relatives support flattened storage for structs and similar class-like abstractions. For example, the C# language has value types. Unlike value objects, instances of these abstractions have identity, meaning they support operations such as field mutation. As a result, the semantics of copying on assignment, invocation, etc., must be carefully specified, leading to a more complex user model and less flexibility for runtime implementations. We prefer an approach that leaves these low-level details to the discretion of JVM implementations.

      Risks and Assumptions

      The feature makes significant changes to the Java object model. Developers may be surprised by, or encounter bugs due to, changes in the behavior of operations such as == and synchronized. We expect such disruptions to be rare and tractable.

      Some changes could potentially affect the performance of identity objects. The if_acmpeq test, for example, typically only costs one instruction cycle, but will now need an additional check to detect value objects. But the identity class case can be optimized as a fast path, and we believe we have minimized any performance regressions.

      There is a security risk that == and hashCode can indirectly expose private field values. Further, two large trees of value objects can take unbounded time to compute ==. Developers need to understand these risks.

      Dependencies

      Strict Field Initialization in the JVM (Preview) provides the JVM mechanism necessary to require, through verification, that value class instance fields are initialized during the early construction phase.

            dlsmith Dan Smith
            dlsmith Dan Smith
            Dan Smith Dan Smith
            Alex Buckley, Brian Goetz
            Votes:
            1 Vote for this issue
            Watchers:
            31 Start watching this issue

              Created:
              Updated: