Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8244968

[valhalla] WeakHashMap needs to handle keys which are inline objects

    XMLWordPrintable

Details

    • Enhancement
    • Resolution: Unresolved
    • P4
    • repo-valhalla
    • repo-valhalla
    • core-libs

    Description

      As documented, WeakHashMap will automatically remove an entry when its key is "no longer in ordinary use". This language is somewhat clarified by saying "When a key has been discarded its entry is effectively removed from the map". The complementary meanings of "in ordinary use" and "discarded" are defined by appealing to finalization: an object to be discarded (taken out of ordinary use) is "made finalizable, finalized, and then reclaimed" by the GC.

      (Note: Since this RFE was filed, the term "inline" has been replaced by the term "primitive". Inline objects and inline classes, in this context, refer to primitive objects and primitive classes.)

      These notions need to be defined for inline objects as well, and the appropriate definition implements. If this fails, we will be unable to reuse WHM for inline keys, and we will be unable to transition classes currently in use as WeakHashMap keys to inline, should we desire to do so.

      There are several simple ways to handle this issue of inline object keys:

      1. [THROW] Declare that WHM will throw an exception if an inline key is inserted.
      2. [STRONGER] Declare that WHM will accept inline keys, but will treat them as strong references.
      3. [WEAKER] Declare that WHM will accept inline keys, but will treat them as exceedingly weak references, dropping them as soon as may be.
      4. [RANDOM] Declare that WHM will accept inline keys, but will treat their lifetime as unpredictable.

      Recall that the JVM treats two inline objects as identical if and only if they have the same class and their fields are pairwise identical. (This second condition sets up a potential recursion in the case of nested inlines.) Thus, two inline objects with the same information content are indistinguishable by all tests. I can copy one from another, or I can throw both away and then reconstruct them from their components (which I happened to remember), and all the copies will look exactly the same.

      This behavior is closely parallel to that of auto-boxed Byte or Character wrapper objects, which maintain a hidden table of canonical instances. Specifically, if I copy one Byte to another, or if I throw away two Bytes with the same value (but remember that value) and later reconstruct them both from the remembered value, I always get the same reference; there is no way to deduce from any of the Byte references how I obtained it.

      If we are to transition Integer to an inline class, we will get a class which appears to have a cache of canonical instances (of size 2^32) and never lets the user construct a fresh instance. This is a model which gives information about WeakHashMap behavior on inlines. In order to be compatible with today's behavior on auto-boxed values, the [STRONGER] option is best, since
      that mimics the behavior of the hidden wrapper caches, when those caches are the only source of wrapper values. When wrapper values are built by explicit construction, or when caches fails to apply (for uncached ints, longs or floats or doubles), then the [RANDOM] option is partially applicable, although it leads to behaviors (on inline keys) which have no correspondence at all to plausible behaviors on today's wrapper objects.

      So, the [STRONGER] option works for WHM. It may appear to violate the spirit of "weakness" in the WHM, but it is consistent: any Byte inserted is inserted forever, since the Byte's identity is not as important as its value, and its value could be read from a file at any moment. There's no way a GC can tell that a byte *value* is discarded or unused or unreachable, nor can there be (that being equivalent to a Halting Problem solution).

      What about inline wrapper objects which contain references? There is not such a good precedent for this as for primitive wrapper types, but it would be reasonable to consider the case of tuples or records; let's look at pairs as a simplified use case.

      If a weak hash map is to have pair keys, what should its "weakness" look like? Let's define the two fields of a pair which serves as a key as *sub-keys*.

      If both sub-keys are primitives, then the above arguments apply as well. The [STRONGER] option is suitable, or perhaps [RANDOM]. After all, two primitives can be encoded into one longer primitive (typically).

      Is [THROW] reasonable here, on grounds that old code never did this before? Probably not. Consider a Rational number type with two primitive components, which has found its way into WHMs for whatever silly reason, and which is being transitioned (quite reasonably) to an inline record type. A WHM should continue to accept Rational objects. It can apply the [STRONGER] or [RANDOM] policies to them, without too much trouble.

      If only one sub-key is a primitive, the other is a reference; let's say for simplicity that it is a reference to a good old identity object X. In this case, [STRONGER] is not so suitable, because X will then be strongly rooted forever in the WHM, which is surely surprising to users. What about [WEAKER]? In that case, the keys will not "stick" in the hash table, again causing bugs.

      When *should* a non-primitive sub-key be considered unreachable, if not "always" or "never" or "random"? There seems to be just one other answer available, which is "as long as that sub-key is reachable". This answer is easy to define; it can be implemented by transforming the pair key into a new pair which wraps the reference in a jlr.WeakReference. The mechanics would be complicated, since the WHM would have to do such "weak-i-fying" transformations on key insertion, and hide their effects when it reports keys.

      Does this make sense as a user model? Yes, it does. Claim: A pair of sub-keys in a WHM becomes unreachable if and only if one of those keys is a proper identity reference, *and* that reference becomes unreachable.

      Proof: Backwards first: If a sub-key exists only in a WHM and is live in no other place, then clearly there is no *pair object* containing that sub-key, except in that same WHM.

      Next, forwards: If a pair of sub-keys is not reachable, it means that no code, anywhere, can come up with that pair value, which means that code doesn't have the ability to come up with one or the other sub-key, which means that one or the other sub-key has gone missing (dead to the GC).

      Or, using the contrapositive: If a pair key has no proper identity reference (as a sub-key) that is dead, it means that every sub-key is either live or a primitive. (Note how this logic immediately extends to larger tuples with multiple references.) If every sub-key is live or primitive, it is possible (at least for code which can move past access boundaries, as the GC can) to collect the live pointer of the sub-key and reconstruct the pair value which expresses the key being inspected. Since that pair value, as an inline object, is in every way indistinguishable from the original pair of sub-keys current stored in the WHM, in follows that the key in question is reachable, apart from the WHM. QED.

      This logic continues to apply if *both* sub-keys are references to identity objects, and to larger keys with more sub-keys, perhaps a mix of many primitives and identity object references.

      This logic also applies to nested inline objects, as if the nesting were dissolved and the inline object keys in the WHM were broken into "shards" of primitive bit-data, nulls, and an array of non-null identity references.

      This reasoning, which gives a plausible job to complex inline objects as keys in weak hash maps, leads to the final option:

      5. [SHARDS] Declare that WHM will accept inline keys, but will treat them as containing unordered sets or tuples of weak references to identity objects (plus other data not subject to weak processing).

      In this option, an inline key breaks into sub-keys which are the fringe of the immutable part/whole structure of the inline object. The fringe consists of primitives, nulls, and references to identity objects. (There are no references to inline objects, because those are would be interior to the part/whole structure, and/or would need the same case analysis as we are doing already. Note that arrays are identity objects for our purposes.)

      The [SHARDS] option can be implemented, as hinted above, by shredding every inline key into its fringe data, and wrapping WeakReference instances around the identity objects on the fringe.

      More specifically, note that an inline object can be tree-structured, and can point to other inline or identity objects. (It can also contain nulls or originally primitive values such as 42.) An primitive object tree is a root primitive object plus has zero or more internal nodes which are also primitive objects. The leaves (or fringe) of this tree are everything other than primitive objects: null, originally primitive values (like 42), and references to identity objects. We define the "identity fringe" of a primitive (root) object as the set of identity objects in that fringe. Duplicates can be disregarded. (The primitive object graph can also be a DAG with primitive internal nodes, but that is operationally equivalent to a tree, and either tree or DAG views lead to the same set of identity leaves.) If we can efficiently extract the identity fringe of a primitive object, we can then ask the GC to perform appropriate weak reference processing on that fringe (in some array-like representation agreed upon by the VM and the core libraries).

      Getting WeakHashMap right for inlines may require creation of new "hooks" into the GC and/or object heap structure (to map between fringes and shards). It may also reflect on the desirable behavior of jlr.WeakReference, which is the current primitive used to implement WHM. See JDK-8234875.

      Note that [STRONG] and [SHARDS] behave the same when the inline object has no further references to identity objects. This is the case with today's primitive wrappers, and the hypothetical Rational type.

      BigInteger (and other array-wrapping numerics) should be treated as strong keys (like other numerics), since this is likely to be the most useful behavior. In that case, the weakness of any backing array would have to be suppressed in the [SHARDS] behavior. It comes for free if we agree on [STRONG]. It is likely that inline types will be used to represent variable-precision data, which means there is an issue here (about the hidden identity of a backing array) regardless of numeric WHM keys.

      Inline objects are likely to be used as lightweight adapters. For example, a String can be wrapped as an inline CIString whose equals and hashCode methods are tweaked to ignore case. More interestingly, an identity object with a structural equals method could be wrapped with an ObjectID wrapper which ignores the structural equals method and rewires its own equals method to perform "acmp", and similarly for hashCode; this adapter would convert any Map into an identity-based map, by wrapping each key in the Map with an ObjectID. Such adapters are likely to find their way into WHMs as keys. In that case, the adapter should clearly go dead when its underlying object goes dead. This means that the [STRONG] option, however appropriate for numerics, is inappropriate for wrappers that contain objects with significant identity. The [SHARD] option is best for such inline wrappers of identity objects. This would be useful new functionality for WHM, since, currently, if you wish to adapt the behavior of keys *and* use a WHM, you must recode the WHM completely. (See, for example, the MethodType intern table, which is such a copy-and-paste adaptation exercise, though it also brings in concurrency.) Inline adapters for WHM keys would increase their reusability.

      Attachments

        Issue Links

          Activity

            People

              rriggs Roger Riggs
              jrose John Rose
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: