-
Bug
-
Resolution: Fixed
-
P3
-
1.3.0
-
None
-
beta
-
generic
-
generic
-
Verified
During deserialization, ObjectInputStream associates incoming class descriptors
with local classes. In many cases, the class hierarchy described by the
incoming class descriptors (taking into account class annotations) matches the
local class hierarchy exactly. For example, take a serializable class B which
extends a serializable superclass A, and consider the case where an instance of
B is serialized and deserialized on the same VM (with both classes in
classpath). The serialization stream contains a record of B's class descriptor
(henceforth referred to as "Bd"), which in turn refers to A's class descriptor
("Ad"). During deserialization, resolveClass() associates Bd with class B and
Ad with class A:
class bound local
descs classes classes
Ad --------> A == A
| |
| |
| |
V V
Bd --------> B == B
Fields declared in Ad and Bd are thus "bound" to actual fields of local classes
A and B such that values contained in the serialization stream can be stored
appropriately during deserialization.
Things become trickier when discrepancies exist between the stream's class
hierarchy and the local VM's class hierarchy. For example, suppose that the
sending VM contains classes A and B as described above, but that the receiving
VM's version of class B does not extend class A.
class bound local
descs classes classes
Ad --------> A
|
|
|
V
Bd --------> B == B
The receiving VM traverses the stream's class descriptor list from child to
superclass, using a modified diff algorithm (based on class equality) to match
as many descriptors with local classes as possible. In this case, no match is
found for Ad, which means that any values associated with Ad in the stream are
discarded during deserialization (since there's nowhere to put them in the
local instance of B).
However, since the diff pattern is based on class equality (i.e., whether a
given class descriptor's bound class is == to some local class), strange
behavior can result when instances of objects are marshalled back and forth
between VMs with different (but partially overlapping) sets of available local
classes.
A specific example: consider 3 VMs VM1, VM2 and VM3, and the same 2
serializable classes A and B, where B extends A. VM1 contains both A and B in
its classpath, VM2 contains only A in its classpath, and VM3 contains neither A
nor B in its classpath. VM1 creates an instance of B, which it marshalls via
RMI to VM2, which passes it on to VM3. This sequence of events will cause the
data associated with class A to mysteriously disappear. Here's why:
1. VM1 creates an instance o of B, which it sends to VM2. i is annotated
with VM1's codebase cb1.
2. VM2 receives o. Since A is in classpath but B is not, instantiates o
with class B loaded from cb1, which extends (local) class A:
(VM2) class bound local
descs classes classes
Ad ------> A == A
| |
| |
| |
V V
Bd ------> Bcb1 == Bcb1
3. VM2 relays i on to VM3. Because classes A and B were loaded from
different classloaders in VM2, they are annotated with cb1 and cb2
respectively. Since resolveClass() is called individually for each
incoming class descriptor during deserialization (and VM3 contains neither
A nor B), Ad resolves to Acb2, while Bd resolves to Bcb1. However, since
VM3's classloader for cb1 doesn't delegate to a classloader for cb2, Bcb1's
superclass is Acb1, *not* Acb2:
(VM3) class bound local
descs classes classes
Ad ------> Acb2 != Acb1
| |
| |
| |
V V
Bd ------> Bcb1 == Bcb1
Since the diff pattern is based on class equality, and Acb2 doesn't match
Bcb1's superclass Acb1, VM2 considers this case equivalent to the "missing
superclass" case described previously. Consequently, the stream data
associated with Ad is discarded! (Sample code is attached to this bug
report which demonstrates the data disappearance).
Unfortunately, this type of situation is becoming commonplace; the
LeaseRenewalManager service introduced in Jini 1.1 already suffers from the
exact problem described above. LeaseRenewalManager will encounter problems in
the following setup:
Jini Lookup
Service
^
|
|
|
Service s -----> LeaseRenewalManager
Service s is a jini service which uses leasing to manage client-allotted
resources; because of this, it contains com.sun.jini.lease.AbstractLease in its
classpath, along with a service-specific Lease class which extends
AbstractLease. The jini lookup service also has AbstractLease in its
classpath, along with its own AbstractLease subclass which it uses to manage
registration subscriptions. Service s wants to use the LeaseRenewalManager to
manage its registration lease with the lookup service; LeaseRenewalManager has
neither AbstractLease nor the lookup service's AbstractLease subclass in its
classpath. When service s passes a lease obtained from the lookup service to
the LeaseRenewalManager, AbstractLease will be annotated with s's codebase,
while the lookup service's AbstractLease subclass will be annotated with the
lookup service's codebase (i.e., AbstractLease == class A, lookup service
AbstractLease subclass == class B). When the LeaseRenewalManager deserializes
the lease, the lease expiration time stored by AbstractLease will be lost.
Currently, it appears that the most feasible solution to this problem is to
match superclass descriptors with superclasses based on class name, not class
identity. This means that a given class descriptor may now be associated with
several secondary local classes (applicable when the class descriptor is
referenced in a superclass context), instead of just one primary local class.
This fix is certainly possible, but will require significant implementation
changes to serialization.
Another interesting ramification of this bug is that if a local Serializable
superclass doesn't (appear to) have a matching class descriptor in the incoming
stream, its custom readObject() method won't be called, so that not only might
its data not properly initialized from the stream, but any custom "fix-up" or
"sanity check" code, intended to maintain some class invariant(s), will not get
executed if it is put in a custom readObject() method as "post-deserialization"
processing.
For example, you could hand code a stream that deserializes a subclass of
BasicPermission such that BasicPermission's init() method never gets called.
You would do this by having the class descriptor of a subclass in the stream
indicate that it has no serializable superclasses, instead of a superclass of
BasicPermission. Not that this *particular* "attack" could be "useful", but
others might be...
with local classes. In many cases, the class hierarchy described by the
incoming class descriptors (taking into account class annotations) matches the
local class hierarchy exactly. For example, take a serializable class B which
extends a serializable superclass A, and consider the case where an instance of
B is serialized and deserialized on the same VM (with both classes in
classpath). The serialization stream contains a record of B's class descriptor
(henceforth referred to as "Bd"), which in turn refers to A's class descriptor
("Ad"). During deserialization, resolveClass() associates Bd with class B and
Ad with class A:
class bound local
descs classes classes
Ad --------> A == A
| |
| |
| |
V V
Bd --------> B == B
Fields declared in Ad and Bd are thus "bound" to actual fields of local classes
A and B such that values contained in the serialization stream can be stored
appropriately during deserialization.
Things become trickier when discrepancies exist between the stream's class
hierarchy and the local VM's class hierarchy. For example, suppose that the
sending VM contains classes A and B as described above, but that the receiving
VM's version of class B does not extend class A.
class bound local
descs classes classes
Ad --------> A
|
|
|
V
Bd --------> B == B
The receiving VM traverses the stream's class descriptor list from child to
superclass, using a modified diff algorithm (based on class equality) to match
as many descriptors with local classes as possible. In this case, no match is
found for Ad, which means that any values associated with Ad in the stream are
discarded during deserialization (since there's nowhere to put them in the
local instance of B).
However, since the diff pattern is based on class equality (i.e., whether a
given class descriptor's bound class is == to some local class), strange
behavior can result when instances of objects are marshalled back and forth
between VMs with different (but partially overlapping) sets of available local
classes.
A specific example: consider 3 VMs VM1, VM2 and VM3, and the same 2
serializable classes A and B, where B extends A. VM1 contains both A and B in
its classpath, VM2 contains only A in its classpath, and VM3 contains neither A
nor B in its classpath. VM1 creates an instance of B, which it marshalls via
RMI to VM2, which passes it on to VM3. This sequence of events will cause the
data associated with class A to mysteriously disappear. Here's why:
1. VM1 creates an instance o of B, which it sends to VM2. i is annotated
with VM1's codebase cb1.
2. VM2 receives o. Since A is in classpath but B is not, instantiates o
with class B loaded from cb1, which extends (local) class A:
(VM2) class bound local
descs classes classes
Ad ------> A == A
| |
| |
| |
V V
Bd ------> Bcb1 == Bcb1
3. VM2 relays i on to VM3. Because classes A and B were loaded from
different classloaders in VM2, they are annotated with cb1 and cb2
respectively. Since resolveClass() is called individually for each
incoming class descriptor during deserialization (and VM3 contains neither
A nor B), Ad resolves to Acb2, while Bd resolves to Bcb1. However, since
VM3's classloader for cb1 doesn't delegate to a classloader for cb2, Bcb1's
superclass is Acb1, *not* Acb2:
(VM3) class bound local
descs classes classes
Ad ------> Acb2 != Acb1
| |
| |
| |
V V
Bd ------> Bcb1 == Bcb1
Since the diff pattern is based on class equality, and Acb2 doesn't match
Bcb1's superclass Acb1, VM2 considers this case equivalent to the "missing
superclass" case described previously. Consequently, the stream data
associated with Ad is discarded! (Sample code is attached to this bug
report which demonstrates the data disappearance).
Unfortunately, this type of situation is becoming commonplace; the
LeaseRenewalManager service introduced in Jini 1.1 already suffers from the
exact problem described above. LeaseRenewalManager will encounter problems in
the following setup:
Jini Lookup
Service
^
|
|
|
Service s -----> LeaseRenewalManager
Service s is a jini service which uses leasing to manage client-allotted
resources; because of this, it contains com.sun.jini.lease.AbstractLease in its
classpath, along with a service-specific Lease class which extends
AbstractLease. The jini lookup service also has AbstractLease in its
classpath, along with its own AbstractLease subclass which it uses to manage
registration subscriptions. Service s wants to use the LeaseRenewalManager to
manage its registration lease with the lookup service; LeaseRenewalManager has
neither AbstractLease nor the lookup service's AbstractLease subclass in its
classpath. When service s passes a lease obtained from the lookup service to
the LeaseRenewalManager, AbstractLease will be annotated with s's codebase,
while the lookup service's AbstractLease subclass will be annotated with the
lookup service's codebase (i.e., AbstractLease == class A, lookup service
AbstractLease subclass == class B). When the LeaseRenewalManager deserializes
the lease, the lease expiration time stored by AbstractLease will be lost.
Currently, it appears that the most feasible solution to this problem is to
match superclass descriptors with superclasses based on class name, not class
identity. This means that a given class descriptor may now be associated with
several secondary local classes (applicable when the class descriptor is
referenced in a superclass context), instead of just one primary local class.
This fix is certainly possible, but will require significant implementation
changes to serialization.
Another interesting ramification of this bug is that if a local Serializable
superclass doesn't (appear to) have a matching class descriptor in the incoming
stream, its custom readObject() method won't be called, so that not only might
its data not properly initialized from the stream, but any custom "fix-up" or
"sanity check" code, intended to maintain some class invariant(s), will not get
executed if it is put in a custom readObject() method as "post-deserialization"
processing.
For example, you could hand code a stream that deserializes a subclass of
BasicPermission such that BasicPermission's init() method never gets called.
You would do this by having the class descriptor of a subclass in the stream
indicate that it has no serializable superclasses, instead of a superclass of
BasicPermission. Not that this *particular* "attack" could be "useful", but
others might be...