Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8061404

A Proposal for User-Defined Lightweight Value-Semantics Objects

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Other
    • Icon: P5 P5
    • None
    • None
    • specification

      =================================
      Name: vi73552 Date: 02/19/99


      A Proposal for User-Defined Lightweight Value-Semantics Objects

      Introduction:
      -------------

      This is a proposal to allow a Java programmer to define new
      Lightweight Value-Semantics Objects [LVSOs] to the Java language.
      "Lightweight" means that manipulating the objects is approximately as
      efficient as manipulating primitive objects. "Value-Semantics" means
      that the "objects" have value semantics as opposed to the reference
      semantics of ordinary Java objects. "Objects" means that the
      declarations whose addition I propose be added adding to the language
      are similar to class declarations, and you get all of the class
      options that are reasonable given value semantics.

      Please note that I propose to allow the user to declare new classes of
      LVSOs, not to create the notion of LVSOs ab initio. Java already has
      LVSOs; int, boolean, etc. We should not debate whether we need LVSOs,
      only whether we should let the user add new LVSO classes.

      I will develop the proposal in three stages:

         the basic stage, in which you essentially get to define your
         primitive types

         the subtypable primitives stage, in which we explore the
         implications of inheritance for LVSOs [whose method calls would not
         be polymorphic, since there are no pointers in Java and no
         references to LVSOs].

         the overloaded operators stage. I won't devote too much attention
         to this because the Java developers do not plan to introduce
         operator overloading in the near future. See the resolution of bug
         #4087427.

      Some of the ideas for type safety language features come from Ada.

      Motivation:
      -----------

      Consider the Apples and Oranges classes of Appendix A. These classes
      have simple behaviors and few instance variables; the motivation for
      declaring the classes is type safety; the compiler won't let you
      compare Apples and Oranges. In this case the programmer's intent was
      to create a new primitive type with int's semantics but with type
      safety.

      Consider the Money class and its subclasses of Appendix B. In this
      case we have a class that implements a non-integral fixed point
      representation, and two subclasses that are, again, provided for type
      safety.

      Lastly, consider the Complex class of Appendix C. The Complex class
      is a canonical example of a type that is not a primitive type in many
      languages, but which you would like to be implemented efficiently.
      The Complex class differs from the other two in the fact that the
      instance variable set is more than a single instance of one primitive
      type.

      These classes could be treated similarly to complicated classes for
      which the overhead of object creation is amortized over a long object
      lifetime, but there are several reasons why the programmer might
      prefer a different treatment for these tiny objects:

        1: The objects are small. They don't represent the fruits of much
           computation, making the high cost of object creation a problem.

        2: You probably want value semantics rather than reference
           semantics. In particular, equals() is likely to do the right
           thing in more cases than ==

      Because all objects are manipulated with references in Java, and
      storage for non-primitive objects are managed by the garbage collector
      and never stack allocated, we must allocate and free objects to get
      type safety or to get the effects of structures. You would like the
      compiler to prevent you from comparing Apples and Oranges, but it cost
      980 units of time[1] to create an Apple or an Orange, where a simple
      assignment costs one time unit. In the real world people will be
      inclined to declare integer variables for discrete units such as
      Apples and Oranges, and float variables for physical units such as
      meters or seconds, and from time to time Java programs will have bugs
      that trace to type mistakes. In the case of complex numbers and other
      small aggregates of primitive types with simple methods, programmers
      will have three unappetizing choices:

        1: Create the objects and pay the performance penalty

        2: Create the objects and reuse them, managing your own storage and
           having the same dangling reference problem you get in languages
           like C++ [although there would be no memory leaks]

        3: Maintain clusters of local variables; in this example you might
           have double ac_signal_real; double ac_signal_imag; where you
           would like to have written Complex ac_signal; this is what you
           would like the compiler to do for you.

      One problem with simply creating the objects is that we are then faced
      with a second choice. We can make the small objects immutable, so for
      example the sole Complex addition method looks like this

             Complex add(Complex addend)
             {
                return new Complex(real + addend.real, imag + addend.imag);
             }

      but efficiency considerations would impel us to at least provide the
      option of modifying an existing Complex and including the method

             Complex add_d(Complex addend)
             {
                real = real + addend.real;
                imag = imag + addend.imag;
                return this;
             }

      and the programmer must always keep in mind whether a Complex
      reference has more than one name at a given point in his program in
      order to be able to use add_d and avoid the cost of an allocation
      every time you add.

      The Basic Proposal; extensions of Primitive
      -------------------------------------------

      When a programmer declares an int in Java [or, for that matter, in
      almost any other programming language], he allocates a cell able to
      contain a basic integer in whatever frame he's declaring in. For
      example, if he declares an int in a method he gets an int in the local
      variables area of the stack frame of every invocation of that method.
      If the programmer keeps modifying that int, the same cell is reused.

      On the other hand, when a programmer declares an object, including a
      boxed instance of a primitive type such as Integer, he allocates a
      cell able to hold a reference to an instance. The class designer has
      to decide whether to make the object updatable, which might reduce the
      creation of new objects but which will lead to unexpected behavior if
      there are multiple pointers to the same object, or immutable, in which
      case the new operator with its performance penalty needs to be
      performed repeatedly. The boxed primitive types in java.lang.* are
      immutable.

      I propose that the programmer be able to declare that a class extends
      java.lang.Primitive instead of the usual java.lang.Object or one of
      its subclasses. If he does this, then the declared memory contains
      the object itself instead of a reference to the object. If a class is
      derived from Primitive then there is no way to declare a reference to
      instances of that class.

      I further propose the following conventions:

        1: You may use any visible constructor to create a value of a
           Primitive extension. You can't apply new to the class name,
           since Primitive extensions are not allocated on the heap.

        2: If you declare a variable of a Primitive extension but provide no
           initial value the instance variables are built with the default
           constructor. There must be one, or an uninitialized declaration
           is forbidden. However, a consequence of rule 1 is that if you
           elect to initialize your variable you could write something like

              Complex unit_real_vector = Complex(1.0, 0.0);

           Of course the constructor must exist and must be visible per its
           access modifier.

        3: The equals method inherited from Primitive recursively applies
           equal to corresponding instance variables. This can, of course,
           be overridden but there is no polymorphism because all value
           types can be determined at compile time.

           A == B is not allowed if A and B are different Primitive subtypes
           of if one operand is a Primitive subtype and the other isn't, and
           is determined by applying A.equals(B) if they are of the same
           Primitive subtype.

        4: Primitive subtypes cannot be declared to implement an interface.

      A Primitive subtype cannot contain itself as an instance variable.
      There can be no sequence of Primitive subtypes S1, S2, ... Sn such
      that S1 contains an S2, S2 contains an S3, ..., S(n-1) contains an Sn
      and Sn contains an S1.

      proposed implementation
      -----------------------

      Types come in three flavors: primitive, Primitive subtype, or object.
      Every Primitive subtype reduces to a sequence of types of the form
      <Array>^i<plain Java type> where i is a non-negative integer. A
      <plain Java type> is a primitive type or an Object. The reduction is
      as follows:

        1: a non-array Object or a primitive type reduces to itself

        2: a Primitive subtype reduces to the concatenation of the reductions
           of the types of its instance variables

        3: If Type reduces to [ <array>^i1<t1>, <array>^i2<t2>, ... ] then
           Type[] reduces to [ <array>^i1+1<t1>, <array>^i2+1<t2>, ... ]


      This is guaranteed to terminate by noncircularity of the instance
      variable type definitions and the fact that there is a finite number
      of Primitive subtypes. In practice I do not expect the reduction of a
      primitive subtype to reduce to more than a handful of types. We call
      the result of this process the "decomposition" of the Primitive
      subtype.

      If a Primitive subtype P decomposes to a single type T, then the
      compiler shall generate code that allocates a cell that can hold a
      value of type T for each variable of type P. The compiler shall
      arrange for the methods to be called on implicit arguments of type T.
      Byte code for static methods is the way to accomplish this since there
      is no this for an instance method call. Therefore, the "methods" of a
      Primitive subtype compile as static methods.

      If a Primitive subtype P decomposes to a sequence of types T1, T2,
      ... Tn, then the compiler shall generate code that allocates n cells,
      one each to hold a value of each of the Ti types, for each variable of
      type P. The compiler shall arrange for the methods to be called on a
      sequence of implicit arguments of types Ti. For each of the explicit
      arguments of a Primitive subtype several formal parameters, one for
      each of that subtype's Ti's, shall be passed instead. For example,

          double magnitude(Complex C){ .. real .. imag .. }

      compiles into byte code appropriate for

          static double magnitude(double real, double imag){ .. real .. imag .. }

      Do note that an array of Complex decomposes to two arrays, each of
      double.

      Because the byte code fundamentally can't return multiple values [as
      in the LISP style], return values of Primitive subtypes which
      decompose to multiple types must be returned in some other manner. I
      propose hidden formal parameters to methods returning such a value.
      The compiler shall generate one mutable boxed primitive type instance
      for each primitive value returned as a Primitive subtype, except for
      the first type T1 which can be returned as the return value. [Owen,
      help me word this?] Although these boxed types must be allocated,
      they only need to be allocated once per calling method invocation at
      worst, and in some cases a method would be able to pass some of the
      hidden boxed primitive types to its callees and therefore avoid extra
      allocations. Another possibility is to have instance variables on
      Thread that maintain per-Thread lists of mutable boxed primitive type
      instances for this purpose. So

          Complex add(Complex addend){ ... }

      compiles as if it were

          static double add(boxed_double imag_ret, double real, double imag,
                            double addend_real, double addend_imag)
            { ... }

      There is one other small detail. The == operator must generate a call to
      equals() when the operands are compatible LVSOs.

      Extending Primitive subtypes:
      -----------------------------

      I do not propose that type information be kept with LVSOs. You may
      therefore not assign a DollarsAndCents to a Money. However,
      inheritance is useful even without polymorphism. I propose the Ada
      convention that an operation on a supertype be extended to the subtype
      by replacing the supertype by the subtype everywhere in the signature,
      including the result type. An operation will not be extended if any
      part of the signature is already of the subtype, or if the result is
      of the supertype and the subtype adds instance variables.

      Overloaded Operators:
      ---------------------

      It doesn't look like operator overloading will be added to Java any
      time soon. However, if it ever is added, say with C++ syntax, you
      might want to be able to engage an operation on its right-hand
      operand. The C++ convention of declaring a variant of a method
      applicable to a different syntax by adding a bogus int formal
      parameter may serve us well here. For example, scalar multiplication
      of a Complex might look like this:

          Complex operation*(double scalar)
          {
             return Complex(real * scalar, imag * scalar);
          }
          Complex operation*(double scalar, int i)
          {
             return Complex(real * scalar, imag * scalar);
          }

      ...

          Complex C(1.0, 2.0)

             ... C * 3.0 // engages Complex operation*(double scalar)

             ... 3.0 * C // engages Complex operation*(double scalar, int i)





      Appendix A

      public final class Apples
      {
        private int number;

        public Apples(int i)
        {
          number = i;
        }

        public Apples plus(Apples addend)
        {
          return Apples(number + addend.number);
        }

        // you get this one for free; it's placed here for emphasis
        public boolean equals(Apples comparand)
        {
          return number == comparand.number;
        }

        ...
      }

      public final class Oranges
      {
        private int number;

        public Oranges(int i)
        {
          number = i;
        }

        public Oranges plus(Oranges addend)
        {
          return Oranges(number + addend.number);
        }

        // you get this one for free; it's placed here for emphasis
        public boolean equals(Oranges comparand)
        {
          return number == comparand.number;
        }

        ...

      }

      Appendix B

      public class Money
      {
        private long cents;

        public Money(double amount)
        {
          cents = amount * 100 + 0.1;
        }

        private Money(long amount)
        {
          cents = amount;
        }

        public Money add(Money addend)
        {
          return Money(cents + addend.cents);
        }

        public boolean equals(Money comparand)
        {
          return cents = addend.cents;
        }

        ...
      }

      public final class DollarsAndCents extends Money
      ...

      public final class Euros extends Money
      ...

      Appendix C

      public final class Complex
      {
        private double real;
        private double imag;

        public Complex(double _real, double _imag)
        {
          real = _real; imag = _imag;
        }

        public Complex add(Complex addend)
        {
          return Complex(real + addend.real, imag + addend.imag);
        }

        public boolean equals(Complex comparand)
        {
          return real == comparand.real && imag = comparand.imag;
        }

        ...
      }








      notes

      [1] Bruce Eckels, "Thinking in Java", Prentice Hall, 1998,
          Pp. 1060-1061

          These timings were performed on Sun's JDK1.1.4 on a Pentium Pro
          200MHz.
      (Review ID: 48962)
      =====================================

            abuckley Alex Buckley
            vasya Vassili Igouchkine (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: