Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8327489

Make GZIPInputStream no longer rely on available() for end-of-stream test

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 23
    • core-libs
    • None
    • behavioral
    • low
    • Hide
      This change makes the GZIPInputStream more likely to try to read into a concatenated stream than it did before. In theory, there can be implementations of `java.io.InputStream` which may have returned 0 from their implementation of available() to specifically prevent GZIPInputStream from reading the concatenated GZIP stream. But it's hard to imagine such implementations. Plus such implementations would be relying on an unspecified internal implementation detail.
      Show
      This change makes the GZIPInputStream more likely to try to read into a concatenated stream than it did before. In theory, there can be implementations of `java.io.InputStream` which may have returned 0 from their implementation of available() to specifically prevent GZIPInputStream from reading the concatenated GZIP stream. But it's hard to imagine such implementations. Plus such implementations would be relying on an unspecified internal implementation detail.
    • Java API
    • Implementation

      Summary

      Update java.util.zip.GZIPInputStream so it doesn't rely on java.io.InputStream.available() method to decide whether or not to read a concatenated GZIP stream from the underlying input stream.

      Problem

      The GZIPInputStream class takes an InputStream to read compressed GZIP data from. GZIP format allows for multiple GZIP streams to be concatenated. An undocumented feature of the implementation in GZIPInputStream is that it supports reading such concatenated GZIP streams. This is possible because the GZIP format defines a 8 byte trailer representing the end of an individual GZIP stream.

      GZIPInputStream has a public read(byte[] buf, int off, int len) method which returns the uncompressed data after reading from the underlying, possibly concatenated GZIP streams. The current implementation of this method after having read an 8 byte trailer in the underlying stream, calls the java.io.InputStream.available() method on the underlying stream to decide whether or not there's a subsequent concatenated GZIP stream data. If the available() method call returns 0 then the implementation in GZIPInputStream.read() does not read any additional data and marks the GZIPInputStream as having reached the end of compressed input stream. Any subsequent calls to read() will return -1 indicating the end of stream.

      Relying on the return value of InputStream.available() method is not appropriate since the InputStream.available() as per its API javadoc states that the return value is merely an estimate of the number of bytes available. That method's API javadoc further states:

      Note that while some implementations of {@code InputStream} will return the total number of bytes in the stream, many will not.

      As a result, the current implementation of GZIPInputStream.read() which relies on the underlying InputStream's available() method can incorrectly consider the GZIP stream to have reached end of stream even when there may be a concatenated GZIP stream. This results in the GZIPInputStream.read() ignoring and thus not returning possibly additional uncompressed data of underlying GZIP streams.

      Solution

      The GZIPInputStream.read() will be updated to remove the check on InputStream.available(). The implementation, after reading a 8 byte GZIP stream trailer, will now attempt to read a GZIP stream header from the underlying input stream. If the additional read()s on the underlying input stream return enough bytes and those bytes represent a GZIP stream header, then the GZIPInputStream.read() method will consider that there is a concatenated GZIP stream and it will continue to return the uncompressed data even from the concatenated stream. If however, the read()s on the underlying input stream don't return enough bytes or the returned bytes don't represent a GZIP stream header, then the GZIPInputStream will be marked as having reached the end of compressed input stream.

      Specification

      There are no specification changes.

            acobbs Archie Cobbs
            webbuggrp Webbug Group
            Jaikiran Pai, Roger Riggs
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: