-
Bug
-
Resolution: Fixed
-
P4
-
6u24, 8, 11, 17, 21
-
b15
-
Verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8337393 | 21-pool | William Kemper | P4 | Open | Unresolved | |
JDK-8338029 | 21.0.7-oracle | Ravi Reddy | P4 | Open | Unresolved |
java version "1.6.0_24"
(also believed to affect latest OpenJDK7 previews)
ADDITIONAL OS VERSION INFORMATION :
all OS (pure Java)
A DESCRIPTION OF THE PROBLEM :
GZIPInputStream's readTrailer() method decides whether to keep reading (for the case of concatenated GZIP members) based on whether the underlying stream's available() > 0. This is a faulty test for end-of-stream; socket streams (and perhaps others) may return 0 merely to mean any read would block, not that any read would fail due to the stream having ended.
As a result, un-GZIPping multi-member streams over a network stream (and perhaps in other contexts) can intermittently trigger the same false-end-of-stream that afflicted JDKs through 6u22 after reading exactly one member. (Exactly how many members are read before triggering this depends on the concordance of member-ends with inflated buffers and network delays, so in the wild it expresses somewhat randomly, but very reliably when reading many-membered streams over network connections.)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a GZIP stream concatenated from independent GZIP members. (Shorter members/more-readTrailer-invocations can trigger faster.) Read it over a network connection (the slower the better, but even 100Mbps-plus connections will exhibit eventually). Observe that sometimes the GZIPInputStream ends early, and then considers itself done no matter the availability of more data from the underlying stream. (You can't recover by retrying.)
Or, try the simulated code below for a local demonstration.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Even if the underlying stream sometimes reports available()==0, reading shoudl continue if there's more valid data forthcoming.
ACTUAL -
Depending on member alignment and network/IO issues, GZIPInnputStream may erroneously end early and insistently.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
public class GZIPAvailableTest {
public static void main(String [] args) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream out = new GZIPOutputStream(baos);
out.write("boo".getBytes("ASCII"));
out.close();
byte[] boo_gz = baos.toByteArray();
baos.reset();
for(int i = 0; i<32; i++) {
baos.write(boo_gz);
}
byte[] manyboo_gz = baos.toByteArray();
GZIPInputStream in = new GZIPInputStream(new ByteArrayInputStream(manyboo_gz));
long count = 0;
while(in.read()>-1) {
count++;
}
System.out.println("read bytes with omniscient available():"+count);
// now simulate a stream that might have 0 available even while more
// data is on the way, as with a socket stream
GZIPInputStream in2 = new GZIPInputStream(new FilterInputStream(new ByteArrayInputStream(manyboo_gz)) {
@Override
public int available() throws IOException {
return 0;
}
});
long count2 = 0;
while(in2.read()>-1) {
count2++;
}
System.out.println("read bytes with zero available():"+count2);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Copy the GZIPInputStream (and related Inflater) source code to new classes. Patch readTrailer() to try reading the next header, rather than guessing it may be present by consulting available(). (This approach will fail when the stream has no valid following gzip header -- same as if available were an accurate indicator of more data.) Or, instead of patching readTrailer, change it to protected, and fix in a subclass override. (Please, please, please, make all of this class's private methods protected so we can workaround bugs without wholesale copy/pasting and do things like read both official GZIP header fields and 'extra fields' without reimplementing the whole class.)
- backported by
-
JDK-8337393 GZIPInputStream readTrailer uses faulty available() test for end-of-stream
- Open
-
JDK-8338029 GZIPInputStream readTrailer uses faulty available() test for end-of-stream
- Open
- csr for
-
JDK-8327489 Make GZIPInputStream no longer rely on available() for end-of-stream test
- Closed
- duplicates
-
JDK-8081450 GZIPInputStream prematurely infers end-of-stream
- Closed
- relates to
-
JDK-7021870 GzipInputStream closes underlying stream during reading
- Closed
-
JDK-8322256 Define and document GZIPInputStream concatenated stream semantics
- In Progress
- links to
-
Commit openjdk/jdk/d3f3011d
-
Review openjdk/jdk/17113
-
Review(master) openjdk/jdk21u-dev/871