-
Bug
-
Resolution: Fixed
-
P4
-
1.3.0
-
beta3
-
generic
-
generic
-
Verified
In the current serialization protocol, it is impossible to reliably skip over
the data written by a class-specific writeObject() method. Consider the
situation where the following classes are available on a given VM:
class A implements Serializable {
}
class B extends A {
// fields ...
private void writeObject(ObjectOutputStream out) throws IOException {
// ...
}
private void readObject(ObjectInputStream in)
throws IOException, ClassNotFoundException
{
// ...
}
}
class C extends B {
}
This VM ("VM 1") serializes an instance of class C to another VM ("VM 2"). VM 2
lacks class B, and instead has the following classes:
class A implements Serializable {
}
class C extends A {
}
When VM 2 deserializes the instance of class C, it needs to skip over the data
written by (unavailable) class B. In the simple case where class B's
writeObject() method did not use defaultWriteObject() or the PutField API, this
can be accomplished by skipping over block-data segments and any objects written
by writeObject() until the TC_ENDBLOCKDATA code is encountered. However, if B
called defaultWriteObject() or writeFields() at the beginning of its
writeObject() method, the next "element" in the stream will be the default
serialized representation of B's fields (i.e., an array of bytes containing B's
primitive field values, followed by each of B's object field values written in
sequence). Note that this default representation is not preceded by any
identifying tag in the protocol grammar.
Consequently, when VM 2 encounters B's data in the serialization stream, it
faces 3 possibilities:
1. The next element in the stream is block data. If this is the case, the
next byte in the stream should be TC_BLOCKDATA or TC_BLOCKDATALONG.
2. The next element in the stream is an object (i.e., B.writeObject() wrote
an object before writing any primitive values that would cause a data block
to be written). If this is the case, the next byte in the stream should be
TC_OBJECT (or TC_STRING, TC_CLASSDESC, etc.).
3. The next element in the stream is the default serialization of B's
fields. Since primitive field values appear first, the next byte in the
stream could have any value.
There is no way for VM 2 to distinguish between case 3 and cases 1 and 2. This
breaks the "self-describing" property of the serialization protocol.
The current serialization implementation handles this case by assuming that the
class without a local counterpart (B, in the example) was written in the default
format, without even checking to see if the class had a writeObject method
defined. This fails in cases where B defines a writeObject() method but does
not use defaultWriteObject() or writeFields() (as is demonstrated in the
attached test code). Even if ObjectInputStream were to check for the presence
of B.writeObject(), it still could not properly handle all cases (as already
stated above).
Presumably the reason this bug hasn't surfaced before is that there haven't been
many cases (we've heard of) to date where the receiver needs to skip over data
written by unknown classes, yet still unmarshal the top-level object
successfully. Although the example given above uses inheritance-shuffling to
illustrate the problem, it's worth noting that the same problem surfaces in
other situations as well (for example, when adding fields to an existing
serializable object). As more people attempt to evolve objects by adding fields
or superclasses, this error condition is likely to become increasingly
commonplace.
the data written by a class-specific writeObject() method. Consider the
situation where the following classes are available on a given VM:
class A implements Serializable {
}
class B extends A {
// fields ...
private void writeObject(ObjectOutputStream out) throws IOException {
// ...
}
private void readObject(ObjectInputStream in)
throws IOException, ClassNotFoundException
{
// ...
}
}
class C extends B {
}
This VM ("VM 1") serializes an instance of class C to another VM ("VM 2"). VM 2
lacks class B, and instead has the following classes:
class A implements Serializable {
}
class C extends A {
}
When VM 2 deserializes the instance of class C, it needs to skip over the data
written by (unavailable) class B. In the simple case where class B's
writeObject() method did not use defaultWriteObject() or the PutField API, this
can be accomplished by skipping over block-data segments and any objects written
by writeObject() until the TC_ENDBLOCKDATA code is encountered. However, if B
called defaultWriteObject() or writeFields() at the beginning of its
writeObject() method, the next "element" in the stream will be the default
serialized representation of B's fields (i.e., an array of bytes containing B's
primitive field values, followed by each of B's object field values written in
sequence). Note that this default representation is not preceded by any
identifying tag in the protocol grammar.
Consequently, when VM 2 encounters B's data in the serialization stream, it
faces 3 possibilities:
1. The next element in the stream is block data. If this is the case, the
next byte in the stream should be TC_BLOCKDATA or TC_BLOCKDATALONG.
2. The next element in the stream is an object (i.e., B.writeObject() wrote
an object before writing any primitive values that would cause a data block
to be written). If this is the case, the next byte in the stream should be
TC_OBJECT (or TC_STRING, TC_CLASSDESC, etc.).
3. The next element in the stream is the default serialization of B's
fields. Since primitive field values appear first, the next byte in the
stream could have any value.
There is no way for VM 2 to distinguish between case 3 and cases 1 and 2. This
breaks the "self-describing" property of the serialization protocol.
The current serialization implementation handles this case by assuming that the
class without a local counterpart (B, in the example) was written in the default
format, without even checking to see if the class had a writeObject method
defined. This fails in cases where B defines a writeObject() method but does
not use defaultWriteObject() or writeFields() (as is demonstrated in the
attached test code). Even if ObjectInputStream were to check for the presence
of B.writeObject(), it still could not properly handle all cases (as already
stated above).
Presumably the reason this bug hasn't surfaced before is that there haven't been
many cases (we've heard of) to date where the receiver needs to skip over data
written by unknown classes, yet still unmarshal the top-level object
successfully. Although the example given above uses inheritance-shuffling to
illustrate the problem, it's worth noting that the same problem surfaces in
other situations as well (for example, when adding fields to an existing
serializable object). As more people attempt to evolve objects by adding fields
or superclasses, this error condition is likely to become increasingly
commonplace.
- relates to
-
JDK-4386898 logging APIs: LogRecord writes custom data before calling defaultWriteObject
-
- Closed
-
-
JDK-4400702 invoking available or read before defaultReadObject causes StreamCorruptedXcp
-
- Closed
-
-
JDK-4400945 calling mismatched defaultReadObject() can corrupt stream or misassign fields
-
- Closed
-