-
Bug
-
Resolution: Unresolved
-
P4
-
7
-
None
-
In Review
-
generic
-
generic
GZIPInputStream supports reading data from multiple concatenated GZIP data streams since JDK-4691425. In order to do this, after the trailer of a stream is read, it attempts to read the header of the next stream, and if successful, proceeds onward, and if the attempt fails, it just ignores the trailing garbage and returns end-of-data.
There are several issues with this:
1. The behaviors of (a) supporting concatenated streams and (b) ignoring trailing garbage are not documented, much less precisely specified.
2. Ignoring trailing garbage is dubious because it could easily hide errors or other data corruption that an application would rather be notified about. Moreover, the API claims that a ZipException will be thrown when corrupt data is read, but obviously that doesn't happen in the trailing garbage scenario.
3. There's no way to create a GZIPInputStream that does NOT support stream concatenation. For example, an application that wanted to send multiple sequential compressed streams over a single underlying stream and read them out one at a time might want to operate in this mode.
See this github comment for a history of this class: https://github.com/openjdk/jdk/pull/17113#issuecomment-1859177655
Suggestion:
- Add new method setEnableConcatenatedStreams(boolean), default true
- When concatenated streams disabled, stop after reading a stream trailer
- When concatenated streams enabled, throw ZipException if there is any data after a trailer but it cannot be successfully interpreted as a next header
From a backward-compatibility point of view, those changes would give the current behavior except now bogus trailing garbage would generate a ZipException instead of being discarded. For more perfect backward compatibility, there could be another knob setIgnoreTrailingGarbage(boolean).
There are several issues with this:
1. The behaviors of (a) supporting concatenated streams and (b) ignoring trailing garbage are not documented, much less precisely specified.
2. Ignoring trailing garbage is dubious because it could easily hide errors or other data corruption that an application would rather be notified about. Moreover, the API claims that a ZipException will be thrown when corrupt data is read, but obviously that doesn't happen in the trailing garbage scenario.
3. There's no way to create a GZIPInputStream that does NOT support stream concatenation. For example, an application that wanted to send multiple sequential compressed streams over a single underlying stream and read them out one at a time might want to operate in this mode.
See this github comment for a history of this class: https://github.com/openjdk/jdk/pull/17113#issuecomment-1859177655
Suggestion:
- Add new method setEnableConcatenatedStreams(boolean), default true
- When concatenated streams disabled, stop after reading a stream trailer
- When concatenated streams enabled, throw ZipException if there is any data after a trailer but it cannot be successfully interpreted as a next header
From a backward-compatibility point of view, those changes would give the current behavior except now bogus trailing garbage would generate a ZipException instead of being discarded. For more perfect backward compatibility, there could be another knob setIgnoreTrailingGarbage(boolean).
- csr for
-
JDK-8330195 Define and document GZIPInputStream concatenated stream semantics
- Draft
- relates to
-
JDK-4691425 GZIPInputStream fails to read concatenated .gz files
- Closed
-
JDK-7036144 GZIPInputStream readTrailer uses faulty available() test for end-of-stream
- Closed
- links to
-
Review(master) openjdk/jdk/18385
-
Review(master) openjdk/jdk/20787