-
CSR
-
Resolution: Unresolved
-
P4
-
None
-
behavioral
-
low
-
Summary
Update java.util.zip.GZIPInputStream
so it doesn't rely on java.io.InputStream.available()
method to decide whether or not to read a concatenated GZIP stream from the underlying input stream.
Problem
The GZIPInputStream
class takes an InputStream
to read compressed GZIP data from. GZIP format allows for multiple GZIP streams to be concatenated. An undocumented feature of the implementation in GZIPInputStream
is that it supports reading such concatenated GZIP streams. This is possible because the GZIP format defines a 8 byte trailer representing the end of an individual GZIP stream.
GZIPInputStream
has a public read(byte[] buf, int off, int len)
method which returns the uncompressed data after reading from the underlying, possibly concatenated GZIP streams. The current implementation of this method after having read an 8 byte trailer in the underlying stream, calls the java.io.InputStream.available()
method on the underlying stream to decide whether or not there's a subsequent concatenated GZIP stream data. If the available()
method call returns 0
then the implementation in GZIPInputStream.read()
does not read any additional data and marks the GZIPInputStream
as having reached the end of compressed input stream. Any subsequent calls to read()
will return -1
indicating the end of stream.
Relying on the return value of InputStream.available()
method is not appropriate since the InputStream.available()
as per its API javadoc states that the return value is merely an estimate of the number of bytes available. That method's API javadoc further states:
Note that while some implementations of {@code InputStream} will return the total number of bytes in the stream, many will not.
As a result, the current implementation of GZIPInputStream.read()
which relies on the underlying InputStream
's available()
method can incorrectly consider the GZIP stream to have reached end of stream even when there may be a concatenated GZIP stream. This results in the GZIPInputStream.read()
ignoring and thus not returning possibly additional uncompressed data of underlying GZIP streams.
Solution
The GZIPInputStream.read()
will be updated to remove the check on InputStream.available()
. The implementation, after reading a 8 byte GZIP stream trailer, will now attempt to read a GZIP stream header from the underlying input stream. If the additional read()
s on the underlying input stream return enough bytes and those bytes represent a GZIP stream header, then the GZIPInputStream.read()
method will consider that there is a concatenated GZIP stream and it will continue to return the uncompressed data even from the concatenated stream. If however, the read()
s on the underlying input stream don't return enough bytes or the returned bytes don't represent a GZIP stream header, then the GZIPInputStream
will be marked as having reached the end of compressed input stream.
Specification
There are no specification changes.
- csr of
-
JDK-8337393 GZIPInputStream readTrailer uses faulty available() test for end-of-stream
- Open