Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-7036144

GZIPInputStream readTrailer uses faulty available() test for end-of-stream

XMLWordPrintable

    • b15
    • Verified

        FULL PRODUCT VERSION :
        java version "1.6.0_24"

        (also believed to affect latest OpenJDK7 previews)

        ADDITIONAL OS VERSION INFORMATION :
        all OS (pure Java)

        A DESCRIPTION OF THE PROBLEM :
        GZIPInputStream's readTrailer() method decides whether to keep reading (for the case of concatenated GZIP members) based on whether the underlying stream's available() > 0. This is a faulty test for end-of-stream; socket streams (and perhaps others) may return 0 merely to mean any read would block, not that any read would fail due to the stream having ended.

        As a result, un-GZIPping multi-member streams over a network stream (and perhaps in other contexts) can intermittently trigger the same false-end-of-stream that afflicted JDKs through 6u22 after reading exactly one member. (Exactly how many members are read before triggering this depends on the concordance of member-ends with inflated buffers and network delays, so in the wild it expresses somewhat randomly, but very reliably when reading many-membered streams over network connections.)

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        Create a GZIP stream concatenated from independent GZIP members. (Shorter members/more-readTrailer-invocations can trigger faster.) Read it over a network connection (the slower the better, but even 100Mbps-plus connections will exhibit eventually). Observe that sometimes the GZIPInputStream ends early, and then considers itself done no matter the availability of more data from the underlying stream. (You can't recover by retrying.)

        Or, try the simulated code below for a local demonstration.

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        Even if the underlying stream sometimes reports available()==0, reading shoudl continue if there's more valid data forthcoming.
        ACTUAL -
        Depending on member alignment and network/IO issues, GZIPInnputStream may erroneously end early and insistently.

        REPRODUCIBILITY :
        This bug can be reproduced always.

        ---------- BEGIN SOURCE ----------
        public class GZIPAvailableTest {
         
            public static void main(String [] args) throws IOException {
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                GZIPOutputStream out = new GZIPOutputStream(baos);
                out.write("boo".getBytes("ASCII"));
                out.close();
                byte[] boo_gz = baos.toByteArray();
                baos.reset();
                for(int i = 0; i<32; i++) {
                    baos.write(boo_gz);
                }
                byte[] manyboo_gz = baos.toByteArray();

                GZIPInputStream in = new GZIPInputStream(new ByteArrayInputStream(manyboo_gz));
                long count = 0;
                while(in.read()>-1) {
                    count++;
                }
                System.out.println("read bytes with omniscient available():"+count);
                
                // now simulate a stream that might have 0 available even while more
                // data is on the way, as with a socket stream
                GZIPInputStream in2 = new GZIPInputStream(new FilterInputStream(new ByteArrayInputStream(manyboo_gz)) {
                    @Override
                    public int available() throws IOException {
                        return 0;
                    }
                    
                });
                long count2 = 0;
                while(in2.read()>-1) {
                    count2++;
                }
                System.out.println("read bytes with zero available():"+count2);
            }
        }
        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        Copy the GZIPInputStream (and related Inflater) source code to new classes. Patch readTrailer() to try reading the next header, rather than guessing it may be present by consulting available(). (This approach will fail when the stream has no valid following gzip header -- same as if available were an accurate indicator of more data.) Or, instead of patching readTrailer, change it to protected, and fix in a subclass override. (Please, please, please, make all of this class's private methods protected so we can workaround bugs without wholesale copy/pasting and do things like read both official GZIP header fields and 'extra fields' without reimplementing the whole class.)

          There are no Sub-Tasks for this issue.

              acobbs Archie Cobbs
              webbuggrp Webbug Group
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: