-
Bug
-
Resolution: Not an Issue
-
P3
-
None
-
7
-
x86
-
windows_7
FULL PRODUCT VERSION :
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
I was forced to choose an OS but the bug is platform independent.
A DESCRIPTION OF THE PROBLEM :
The "spec" for ZIP files http://www.pkware.com/documents/casestudies/APPNOTE.TXT says in the section about data descriptors:
When compressing files, compressed and uncompressed sizes
should be stored in ZIP64 format (as 8 byte values) when a
files size exceeds 0xFFFFFFFF. However ZIP64 format may be
used regardless of the size of a file. When extracting, if
the zip64 extended information extra field is present for
the file the compressed and uncompressed sizes will be 8
byte values.
This means the sizes are eight byte if there is a ZIP64 extenden information extra field and four bytes if there is none. This is not what java.util.zip implements, ZipOutputStream#writeEXT in http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/classes/java/util/zip/ZipOutputStream.java writes eight byte data if one of the sizes exceeeds 0xFFFFFFFF but it never writes any ZIP64 extended information extra field. This means conforming implementations will "think" the sizes are four bytes while in fact they are eight bytes.
Likewise ZipInputStream#readEnd always assumes the sizes are eight byte if the Inflater has seen more than 0xFFFFFFFF bytes and four bytes otherwise - this will lead to reading too few bytes if the ZIP64 extended information field is present but the sizes are smaller than 2^32
I stumbled over this while implementing ZIP64 support for Apache Commons Compress, using Java 7's jar as one of my interop partners.
I realize there is a difficult choice to be taken when writing to a stream - which as of this report hasn't been implemented for entries of unknown size in Apache Commons Compress - as you either have to always add the ZIP64 field or never if you don't know how much you are going to write. At least for entries of known size - like the files the jar tool adds to the archive - you should be able to not use the data descriptor at all, though.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
pick a file bigger than 4GB and create a jar file from it, there won't be any ZIP64 extended information extra field but the data descriptor uses eight bytes. One example is the file 5GB_of_Zeros_jar.zip attached to https://issues.apache.org/jira/browse/COMPRESS-36
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
There must be a ZIP64 extended information extra field whenever you write a data descriptor with eight byte sizes.
Always assume the sizes are eight byte when reading and there is a ZIP64 extended information extra field, never assume they are eight bytes if there is none.
ACTUAL -
sizes only depend on the number of bytes defalter/inflater have processed/written.
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
Tools processing the archive may be able to work around the problem if they don't need the size information at all. I'm also thinking of adding some heuristics along the lines of "if Infalter has seen more than 0xFFFFFFFF, then the sizes are probably eight byte even if there is no ZIP64 extra field - let's look whether there is a usable signature after eight/sixteen bytes" but this is clumsy.
I don't see any workaround when ZipInputStream tries to read a perfectly valid ZIP file that sets a ZIP64 extra field with a size smaller than 0xFFFFFFFF - when reading the data descriptor it will reight eight bytes too few and not be positioned at the next LFH or the central directory.
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
I was forced to choose an OS but the bug is platform independent.
A DESCRIPTION OF THE PROBLEM :
The "spec" for ZIP files http://www.pkware.com/documents/casestudies/APPNOTE.TXT says in the section about data descriptors:
When compressing files, compressed and uncompressed sizes
should be stored in ZIP64 format (as 8 byte values) when a
files size exceeds 0xFFFFFFFF. However ZIP64 format may be
used regardless of the size of a file. When extracting, if
the zip64 extended information extra field is present for
the file the compressed and uncompressed sizes will be 8
byte values.
This means the sizes are eight byte if there is a ZIP64 extenden information extra field and four bytes if there is none. This is not what java.util.zip implements, ZipOutputStream#writeEXT in http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/classes/java/util/zip/ZipOutputStream.java writes eight byte data if one of the sizes exceeeds 0xFFFFFFFF but it never writes any ZIP64 extended information extra field. This means conforming implementations will "think" the sizes are four bytes while in fact they are eight bytes.
Likewise ZipInputStream#readEnd always assumes the sizes are eight byte if the Inflater has seen more than 0xFFFFFFFF bytes and four bytes otherwise - this will lead to reading too few bytes if the ZIP64 extended information field is present but the sizes are smaller than 2^32
I stumbled over this while implementing ZIP64 support for Apache Commons Compress, using Java 7's jar as one of my interop partners.
I realize there is a difficult choice to be taken when writing to a stream - which as of this report hasn't been implemented for entries of unknown size in Apache Commons Compress - as you either have to always add the ZIP64 field or never if you don't know how much you are going to write. At least for entries of known size - like the files the jar tool adds to the archive - you should be able to not use the data descriptor at all, though.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
pick a file bigger than 4GB and create a jar file from it, there won't be any ZIP64 extended information extra field but the data descriptor uses eight bytes. One example is the file 5GB_of_Zeros_jar.zip attached to https://issues.apache.org/jira/browse/COMPRESS-36
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
There must be a ZIP64 extended information extra field whenever you write a data descriptor with eight byte sizes.
Always assume the sizes are eight byte when reading and there is a ZIP64 extended information extra field, never assume they are eight bytes if there is none.
ACTUAL -
sizes only depend on the number of bytes defalter/inflater have processed/written.
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
Tools processing the archive may be able to work around the problem if they don't need the size information at all. I'm also thinking of adding some heuristics along the lines of "if Infalter has seen more than 0xFFFFFFFF, then the sizes are probably eight byte even if there is no ZIP64 extra field - let's look whether there is a usable signature after eight/sixteen bytes" but this is clumsy.
I don't see any workaround when ZipInputStream tries to read a perfectly valid ZIP file that sets a ZIP64 extra field with a size smaller than 0xFFFFFFFF - when reading the data descriptor it will reight eight bytes too few and not be positioned at the next LFH or the central directory.
- relates to
-
JDK-8303866 Allow ZipInputStream.readEnd to parse small Zip64 ZIP files
- Resolved