Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8330195

Define and document GZIPInputStream concatenated stream semantics

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Unresolved
    • Icon: P5 P5
    • 24
    • core-libs
    • None
    • source
    • low
    • This is a documentation-only change, so functional compatibility should not be affected.
    • Java API
    • JDK

      Summary

      Update the Javadoc for GZIPInputStream to clarify its behavior when multiple concatenated GZIP streams are encountered.

      Problem

      GZIPInputStream supports reading data from multiple concatenated GZIP data streams. In order to do this, after a GZIP trailer frame is read, it attempts to read a new GZIP header frame and, if successful, proceeds onward to decompress the new stream. If the attempt to decode a GZIP header frame fails, or happens to trigger an IOException, it just ignores the trailing garbage and/or the IOException and returns EOF.

      There are several issues with this:

      • The behaviors of (a) supporting concatenated streams and (b) ignoring trailing garbage are not documented, much less precisely specified.

      • Ignoring trailing garbage is dubious because it could easily hide errors or other data corruption that an application would rather be notified about. Moreover, the API claims that a ZipException will be thrown when corrupt data is read, but obviously that doesn't happen in the trailing garbage scenario - so, for example, N concatenated streams where the last one has a corrupted header frame is indistinguishable from N-1 valid streams.

      • There's no way to create a GZIPInputStream that does not support stream concatenation.

      • There's no way to create a GZIPInputStream that does not ignore trailing garbage.

      On the other hand, GZIPInputStream is an old class with lots of existing usage, so it's important to preserve the existing behavior, warts and all.

      As a result, this proposal does not include any functionality changes.

      Instead, the first step is to property document some of the above behavior: we want to make clear to users of this class how it behaves, but without specifying that behavior in such detail that we set in stone problematic behavior. So there is a trade-off here between the precision of the documentation vs. preserving the ability to change that behavior without adding new constructors and/or new "configuration" methods.

      Solution

      The solution is to describe the following:

      • Note that the GZIP format has its own framing, and therefore concatenated GZIP streams are possible

      • Properly formatted concatented GZIP streams will be followed and decoded automatically

      • An invalid GZIP header frame following a valid GZIP trailer frame results in EOF

      In the future, the suppression of trailing garbage exceptions may be revoked, but due to that last item this would require a new constructor and/or method.

      What's not specified:

      • How many additional bytes are read and ignored in the "trailing garbage" scenario

      Specification

      Update the class-level Javadoc with this background information about GZIP framing:

      diff --git a/src/java.base/share/classes/java/util/zip/GZIPInputStream.java b/src/java.base/share/classes/java/util/zip/GZIPInputStream.java
      index ab7ea53793f..ebf6717cf27 100644
      --- a/src/java.base/share/classes/java/util/zip/GZIPInputStream.java
      +++ b/src/java.base/share/classes/java/util/zip/GZIPInputStream.java
      @@ -37,6 +37,13 @@
        * This class implements a stream filter for reading compressed data in
        * the GZIP file format.
        *
      + * <p>
      + * In the GZIP file format, compressed data payloads are preceded by a
      + * header and followed by a trailer. When a trailer is immediately followed by
      + * a new header, this class continues to decode compressed data as a single,
      + * concatenated stream. Otherwise, any additional trailing bytes are discarded
      + * as if the end of stream is reached.
      + *
        * @see         InflaterInputStream
        * @author      David Connelly
        * @since 1.1

            eirbjo Eirik Bjørsnøs
            acobbs Archie Cobbs
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: