Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8251554

JEP 401: Primitive Classes (Preview)

    XMLWordPrintable

    Details

    • Type: JEP
    • Status: Candidate
    • Priority: P3
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • JEP Type:
      Feature
    • Exposure:
      Open
    • Scope:
      SE
    • Discussion:
      valhalla dash dev at openjdk dot java dot net
    • Effort:
      XL
    • Duration:
      XL
    • JEP Number:
      401

      Description

      Summary

      Support new, developer-declared primitive types in Java. This is a preview language and VM feature.

      Goals

      This JEP introduces primitive classes, special kinds of value classes that define new primitive types.

      The Java programming language will be enhanced to recognize primitive class declarations and support new primitive types in its type system.

      The Java Virtual Machine will be enhanced with a new Q carrier type to encode declared primitive types.

      Non-Goals

      This JEP is concerned with the core treatment of developer-declared primitives. Additional features to improve integration with the Java programming language are not covered here, but are expected to be developed in parallel. Specifically:

      • JEP 402 will enhance the basic primitives (int, boolean, etc.) by giving them primitive class declarations.

      • A separate JEP will update Java's generics so that primitive types can be used as type arguments.

      Other followup efforts may enhance existing APIs to take advantage of primitive classes, or introduce new language features and APIs built on top of primitive classes.

      Motivation

      Java developers work with two kinds of values: primitives and objects.

      Primitives offer better performance, because they are typically inlined—stored directly (without headers or pointers) in variables, on the computation stack, and, ultimately, in CPU registers. Hence, memory reads do not have additional indirections, primitive arrays are stored densely and contiguously in memory, primitive-typed fields can be similarly compact, primitive values do not require garbage collection, and primitive operations are performed within the CPU.

      Objects offer better abstractions, including fields, methods, constructors, access control, and nominal subtyping. But objects traditionally perform poorly in comparison to primitives, because they are primarily stored in heap-allocated memory and accessed by reference.

      Value objects, introduced by another JEP, significantly improve object performance in many contexts, providing a good fusion of the better abstractions of objects with the better performance of primitives.

      However, certain invariant properties of objects limit how much they can be optimized—particularly when stored in fields and arrays. Specifically:

      • A variable of a reference type may be null, so the inlined layout of a value object typically requires some additional bits to encode null. For example, a variable storing an int can fit in 32 bits, but for a value class with a single int field, a variable of that class type could use up to 64 bits.

      • A variable of a reference type must be modified atomically. This often makes it impractical to inline a value object, because its layout would be too large for efficient atomic modification. Large primitive types (currently, double and long) make no such atomicity guarantees, so variables of these types can be modified efficiently without indirect representations (concurrency is instead managed at a higher level).

      Primitive classes give developers the capability to define new primitive types that aren't subject to these limitations. Programs can make use of class features without giving up any of the performance benefits of primitives.

      Applications of developer-declared primitives include:

      • Numbers of varieties not supported by the basic primitives, such as unsigned bytes, 128-bit integers, and half-precision floats;

      • Points, complex numbers, colors, vectors, and other multi-dimensional numerics;

      • Numbers with units—sizes, rates of change, currency, etc.;

      • Bitmasks and other compressed encodings of data;

      • Map entries and other data structure internals;

      • Data-carrying tuples and multiple returns;

      • Aggregations of other primitive types, potentially multiple layers deep

      Description

      The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags.

      Primitive classes

      A primitive class is a special kind of value class that introduces a new primitive type.

      As value classes, primitive classes have no identity. This allows their instances to be freely converted between value objects and simpler primitive values. A primitive value can be thought of as a bare sequence of field values, without any headers or extra pointers.

      A primitive class is declared with the primitive contextual keyword.

      primitive class Point implements Shape {
          private double x;
          private double y;
      
          public Point(double x, double y) {
              this.x = x;
              this.y = y;
          }
      
          public double x() { return x; }
          public double y() { return y; }
      
          public Point translate(double dx, double dy) {
              return new Point(x+dx, y+dy);
          }
      
          public boolean contains(Point p) {
              return equals(p);
          }
      }
      
      interface Shape {
          boolean contains(Point p);
      }

      (Alternatively, we might prefer the class to be declared as primitive Point.)

      Primitive class declarations are subject to the same restrictions as other value class declarations. For example, the instance fields of a primitive class are implicitly final, so cannot be assigned outside of a constructor or initializer.

      In addition, no instance field of a primitive class declaration may have a primitive type that depends—directly or indirectly—on the declaring class. In other words, with the exception of reference-typed fields, the class must allow for flat, fixed-size layouts without cycles.

      In most other ways, a primitive class declaration is just like any other class declaration. It can have superinterfaces, type parameters, enclosing instances (todo: maybe a bad idea, because it allows enclosing this to be null), inner classes, overloaded constructors, static members, and the full range of access restrictions on its members.

      Primitive types

      The name of a primitive class denotes that class's primitive type. Primitive types store instances of the named class as primitive values. Instances can be created with normal class instance creation expressions.

      Point p1 = new Point(1.0, -0.5);

      Field access and method invocation are supported by primitive types. The members of a primitive type are the same as the members of the class.

      assert p1.x() == 1.0;
      Point p2 = p1.translate(0.0, 1.0);
      System.out.println(p2.toString());

      Primitive types support the == and != operators when comparing two values of the same type. As is the case for value objects, the == comparison recursively compares the values' fields.

      Point p3 = new Point(1.8, 3.6);
      Point p4 = p3.translate(0.0, 0.0);
      assert p3 == p4;

      Like a value class reference type, an expression of a primitive type cannot be used as the operand of a synchronized statement.

      Unlike other value classes, a this expression in the body of a primitive class has a primitive type.

      Default values and null

      Like the basic primitive types (int, boolean, etc.), declared primitive types do not allow null.

      Whenever a field or array component is created, the longstanding behavior is to set its initial value to the default value of its type. For reference types, this value is null, and for the basic primitive types, this value is 0 or false.

      For a declared primitive type, the default value is the initial instance of the class: an instance whose fields are all set to their own default values.

      Object[] os = new Object[5];
      assert os[0] == null;
      Point[] ps = new Point[5];
      assert ps[0].x() == 0.0 && ps[0].y() == 0.0;

      As shorthand, the default value of a primitive type can be expressed with the class name followed by the default keyword.

      assert Point.default.x() == 0.0 &&
             Point.default.y() == 0.0;

      Note that the initial instance of a primitive class is created without invoking any constructors or instance initializers, and is available to anyone with access to the class (or its reflective Class object). Primitive classes are not able to specify an initial instance that sets fields to something other than their default values.

      Methods of primitive classes should be designed to work on the initial instance. If this isn't feasible (for example, a reference-typed field is expected to be non-null), it may not be appropriate for the class to have a primitive type. Instead, it can be declared as a normal value class.

      Multi-threaded reads and writes

      As for the basic primitive types double and long, when a field or array component has a declared primitive type, reads and writes might not be atomic. As a result, in a multi-threaded program, unexpected instances may be encountered.

      Point[] ps = new Point[]{ new Point(0.0, 1.0) }; 
      new Thread(() -> ps[0] = new Point(1.0, 0.0)).run(); 
      Point p = ps[0]; // may be (1.0, 1.0), among other possibilities 

      Like initial instances, primitive class instances produced by non-atomic reads and writes are created without invoking any constructors or instance initializers. There is no opportunity for the class to ensure that the field values of the new object are compatible with each other (for example, a start index may end up being greater than an end index).

      To ensure that a particular primitive-typed field is always read from and written to atomically, the field can be declared volatile. But there is no mechanism for a primitive class to ensure that all fields and array components of its type are considered volatile.

      A class with a complex integrity constraint in its constructor may not be a good candidate to be a primitive class. Instead, it can be declared as a normal value class.

      Reference types

      Primitive values are monomorphic—they belong to a single type with a specific set of fields known at compile time and runtime. Values of different primitive types can't be mixed.

      To participate in the polymorphic reference type hierarchy, primitive values are converted to value objects with a value object conversion. This occurs implicitly when assigning from a primitive type to a reference type. The result is an instance of the same class, just in a different form.

      Shape s = p1; // value object conversion
      assert s.getClass() == Point.class;

      When invoking an inherited method of a primitive type, the receiver value undergoes value object conversion to have the type expected by the method declaration.

      Point p = new Point(0.3, 7.2);
      // toString is declared by Object
      p.toString(); // value object conversion

      It is sometimes useful to talk about the reference type of a primitive class. This type is expressed with the class name followed by the ref contextual keyword. A variable with a primitive class reference type stores either a value object belonging to the named class or null.

      Point.ref[] prs = new Point.ref[10];
      prs[1] = new Point(1.0, 1.0);
      prs[4] = new Point(4.0, 4.0);
      for (Point.ref pr : prs) {
          if (pr != null)
              System.out.println(pr);
      }

      The ref type is useful when null is needed or when the runtime characteristics of reference types are preferred (for example, a large sparse array might be more efficiently encoded with references).

      The relationship between the types Point and Point.ref is similar to the traditional relationship between the types int and Integer. However, Point and Point.ref both correspond to the same class declaration; the values of both types are instances of a single Point class. At run time, the conversion between a primitive value and a value object is more lightweight than traditional boxing conversion.

      Value objects can be converted back to primitive values with a primitive value conversion. null cannot be converted to a primitive value, so attempts to convert it cause an exception.

      Point p = prs[1]; // primitive value conversion
      prs[1] = null;
      p = prs[1]; // NullPointerException

      When invoking a method overridden by a primitive class, the receiver object undergoes primitive value conversion to have the type expected by the method declaration.

      Shape s = new Point(0.7, 3.2);
      // 'contains' is declared by Point
      s.contains(Point.default); // primitive value conversion
      Overload resolution and type arguments

      Value object conversion and primitive value conversion are allowed in loose, but not strict, invocation contexts. This follows the pattern of boxing and unboxing: a method overload that is applicable without applying the conversions takes priority over one that requires them.

      void m(Point p, int i) { ... }
      void m(Point.ref pr, Integer i) { ... }
      
      void test(Point.ref pr, Integer i) {
          m(pr, i); // prefers the second declaration
          m(pr, 0); // ambiguous
      }

      For now, Java's generics only work with reference types. Another JEP will enhance generics to interoperate with primitive types.

      Thus, provisionally, type arguments must be inferred to be reference types. Type inference treats value object and primitive value conversions the same as boxing and unboxing—for example, a primitive value passed where an inferred type is expected will lead to a reference-typed inference constraint.

      var list = List.of(new Point(1.0, 5.0));
      // infers List<Point.ref>
      Array subtyping

      Traditionally, primitive array types are not related to reference array types—an int[]</code> cannot be assigned to an <code class="prettyprint" >Object[] variable.

      Arrays of declared primitive types are more flexible: the type Point[]</code> is a
      subtype of <code class="prettyprint" >Point.ref[]
      , which is a subtype of Object[]</code>.</p> <p>(Basic primitive array types like <code class="prettyprint" >int[] will also gain this capability with JEP 402.)

      When a reference is stored in an array of static type Object[], if the array's runtime component type is Point then the operation will perform both an array store check (checking that the object is an instance of class Point) and a primitive value conversion (converting the object to a primitive value).

      Similarly, reading from an array of static type Object[]</code> will cause a
      value object conversion if the array stores primitive values.</p> <pre class="prettyprint" ><code>Object replace(Object[] objs, int i, Object val) { Object result = objs[i]; // may perform value object conversion objs[i] = val; // may perform primitive value conversion return result; } Point[] ps = new Point[]{ new Point(3.0, -2.1) }; replace(ps, 0, new Point(-2.1, 3.0)); replace(ps, 0, null); // NPE from primitive value conversion

      class file representation & interpretation

      A primitive class is declared in a class file using the ACC_PRIMITIVE modifier (0x0800). At class load time, an error occurs if a primitive class is not a value class (via ACC_VALUE, 0x0100). At preparation time, an error occurs if a primitive class has a primitive type circularity in its instance fields.

      A declared primitive type is represented with a new Q descriptor prefix (QPoint;). The class's reference type is represented using the usual L descriptor (LPoint;).

      Primitive values with Q types are one-slot stack values, even though they may represent aggregates of much more than 32 or 64 bits. No particular encoding of primitive values is mandated.

      Verification treats a Q type as a subtype of the corresponding L type—e.g., QPoint; is a subtype of LPoint;. Conversions from primitive values to value objects occur implicitly, as needed.

      The this parameter of a primitive class's instance method has a primitive type.

      Classes mentioned by primitive types in field and method descriptors are loaded during linkage, before the first access of that field or method.

      A CONSTANT_Class constant pool entry may refer to a primitive type using a Q descriptor as a "class name". A CONSTANT_Class using the plain name of a primitive class represents the class's reference type.

      The aconst_init instruction may refer to either a primitive type or a reference type. This determines whether a primitive value or a value object is produced.

      Similarly, a CONSTANT_Fieldref or CONSTANT_Methodref may refer to a field or method as a member of a primitive type or a reference type. In the case of withfield, this determines the result type of the operation.

      The anewarray and multianewarray instructions can be used to create arrays of declared primitive types. Array subtyping allows these arrays to be viewed as instances of reference array types.

      The checkcast, instanceof, and aastore opcodes support primitive value types, performing primitive value conversions (including null checks) when necessary.

      Primitive classes may be initialized for the same reasons as other classes (for example, before a static method is invoked). In addition, primitive class initialization is triggered by the aconst_init instruction, by each of the anewarray and multianewarray instructions when used with a primitive type, and (recursively) by initialization of another class that declares a primitive-typed field mentioning the primitive class.

      Core reflection

      Every primitive class has a java.lang.Class object representing the class. For both primitive values and value objects, the getClass method of the class's instances returns this object. A class literal—Point.class—can also be used to express this object.

      Tentatively: this Class object returns true from the isPrimitive method, and getModifiers shows its Modifier.PRIMITIVE flag set.

      For uses that need to model types, there is one Class object representing the primitive type, and another representing the reference type. Each of these have the same behavior as the Class object representing the class in most respects, except for methods to explicitly tell them apart and map from one to the other.

      Tentatively: the Class object representing the class doubles as a representation of the primitive type. A separate Class object exist for the purpose of representing the reference type.

      Other APIs

      The following APIs also gain new behaviors:

      • java.lang.constant encodes Q types in CONSTANT_Class structures and field and method descriptors

      • java.lang.invoke recognizes Q types and supports L-to-Q conversions

      • javax.lang.model recognizes primitive class declarations

      Performance model

      In typical usage, in heap storage and during fully-optimized code execution, declared primitive types should have a footprint and execution overhead comparable to the basic primitive types. For example, a Point, as declared above, can be expected to directly occupy 128 bits in local variables, parameters, fields, and array components. A field access simply extracts the first or second 64 bits. There are no additional pointers or metadata fields.

      Notably, a primitive class with a single instance field can be expected to have minimal overhead compared to operating on a value of the field's type directly.

      However, JVMs are ultimately free to encode primitive values however they see fit. Some classes may be considered too large to represent inline. Certain JVM components, in particular those that are less performance-tuned, may prefer to interact with primitive values as objects. A primitive value might carry with it a cached value object pointer to reduce the overhead of future conversions. Etc.

      Value objects that are instances of primitive classes can be expected to behave much like instances of other value classes.

      HotSpot implementation

      This section describes implementation details of this release of the HotSpot virtual machine, for the information of OpenJDK engineers. These details are subject to change in future releases and should not be assumed by users of HotSpot or other JVMs.

      Values of Q types in HotSpot are encoded as follows:

      • Primitive classes whose field layouts exceed a size threshold are always encoded as regular heap objects. Fields marked volatile always store regular heap objects.

      • Otherwise, primitive values are encoded in fields and arrays as a flattened sequence of field values. Array components may be padded to achieve good alignment.

      • In the interpreter and C1, primitive values on the stack are represented as value objects. Each read of a primitive-typed field or array allocates a heap object.

      • In C2, primitive values on the stack are scalarized, effectively encoding each field as a separate variable. Methods with Q-typed parameters support both a pointer-based entry point (for interpreter and C1 calls) and a scalarized entry point (for C2-to-C2 calls). Value objects are also scalarized when working with the primitive class's reference type. Heap allocations occur where any other supertype is used.

      Default values are generally encoded as sequences of zeros, simplifying the task of field and array creation. However, in cases where a field or array encodes primitive values as heap pointers, the default value is a non-zero pointer. (Circularities may require this value to be null temporarily, but the null must be hidden from program code.)

      Some array types, like [Ljava/lang/Object; and [LPoint;, allow for both pointer-based and flattened arrays. Reads and writes for these types dynamically check a flag and perform the necessary conversions when operating on flattened arrays.

      Alternatives

      Making use of the basic primitive types, rather than declaring new primitives, will often produce a program with equivalent or slightly better performance. However, this approach gives up the valuable abstractions provided by classes. It's easy to, say, interpret a double with the wrong units, pass an out-of-range int to a library method, or fail to keep two boolean flags together in the right order.

      Normal value classes provide many of the benefits of primitive classes, without the substantial disruptions to the language and JVM type systems. With additional innovation in JVM implementation techniques and hardware capabilities, the gap may close further. However, the limitations outlined in the "Motivation" section are pretty fundamental. For example, a value class type wrapping a single long field and supporting the full range of long values for that field can never be encoded in fewer than 65 bits. Primitive classes give programmers who need fine-grained control a more reliable performance model.

      We considered many different approaches to boxing and polymorphism before settling on a model in which primitive values and value objects are two different representations, with two different types, of the same class instances. This strategy balances the traditional understanding of primitive types, with familiar semantics, performance expectations, and conversions to objects, with the simplicity of a single named class declaration for modeling data in both the primitive and reference spaces. Strategies in which a primitive value is a object obscure some important differences between the types. Strategies in which conversions occur between two different class-like entities introduce distracting complexity.

      Risks and Assumptions

      There are security risks involved in allowing instance creation outside of constructors, via default instances and non-atomic reads and writes. Developers will need to understand the implications, and recognize when it would be unsafe to declare a class primitive.

      This JEP does not address the interaction of primitive classes with the basic primitives or generics; these features will be addressed by other JEPs (see below). But, ultimately, all three JEPs will need to be completed to deliver a cohesive language design.

      Dependencies

      This JEP depends on Value Objects, which establishes the semantics of primitives when treated as objects. Primitive classes are a special case of value classes.

      In support of this JEP, there are separate efforts to improve the JVM Specification (in particular its treatment of class file validation) and the Java Language Specification (in particular its treatment of types). These changes address technical debt and facilitate the specification of these new features.

      In JEP 402 we propose to update the basic primitive types (int, boolean, etc.) to be represented by primitive classes, unifying the two kinds of primitive types. The existing wrapper classes will be repurposed to represent the corresponding types' primitive classes.

      In another JEP we will propose modifying the generics model in Java to make type parameters universal—instantiable by all types, both reference and primitive.

      In the future, JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by primitive types.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              dlsmith Dan Smith
              Reporter:
              dlsmith Dan Smith
              Owner:
              Dan Smith Dan Smith
              Reviewed By:
              Brian Goetz
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

                Dates

                Created:
                Updated: