-
CSR
-
Resolution: Approved
-
P4
-
None
-
behavioral
-
low
-
-
Java API
-
Implementation
Summary
Allow java.util.zip.ZipInputStream
to parse entries using the Zip64 format where neither the compressed nor uncompressed file size exceeds the 4GB limit.
Problem
The compressed and uncompressed size of a ZIP entry are often not known until all entry data has been written by the client.
If the producer cannot seek back in the ZIP stream to update the size fields in the LOC header, those fields are left as zero and the actual compressed and uncompressed file sizes are instead put in a 'Data Descriptor' record immediately following the file data.
If the entry uses the Zip64 format, then the 'compressed size' and 'uncompressed size' fields are instead set to the magic marker value 0xFFFFFFFF and a Zip64 extra field is added with the 'Original Size' and 'Compressed Size' both set to zero.
The 'Data Descriptor' record normally encodes size fields using 4 byte numbers. However, 8-byte numbers should be used instead when either the compressed or uncompressed sizes exceed 4GB, or if the entry uses the Zip64 format:
4.3.9.2 When compressing files, compressed and uncompressed sizes
SHOULD be stored in ZIP64 format (as 8 byte values) when a
file's size exceeds 0xFFFFFFFF. However ZIP64 format MAY be
used regardless of the size of a file. When extracting, if
the zip64 extended information extra field is present for
the file the compressed and uncompressed sizes will be 8
byte values.
ZipInputStream currently relies solely on the size information aquired from the Inflater when deciding how to parse the data descriptor record. The LOC is not consulted to see if the entry uses the Zip64 format.
If an entry does use the Zip64 format, but neither the compressed or uncompressed sizes exceed 4GB, then ZipInputStream currently fails to parse the Data Descriptor correctly and a ZipException is thrown instead:
java.util.zip.ZipException: invalid entry size (expected 0 but got 6 bytes)
at java.base/java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:616)
While ZipOutputStream does not use the Zip64 format when writing entries of an unknown size, other tools do produce such files, including Info-ZIP used in streaming mode:
echo hello | zip -fd > hello.zip
It would be useful to update ZipInputStream to allow parsing such valid ZIP files. Supporting these files could benefit OpenJDK testing as well, which currently relies on producing very large files to test Zip64.
Solution
The solution is to update ZipInputStream such that it not only consults the number of compressed and uncompressed bytes read by the Inflater, but also inspects the LOC header to determine if it uses the Zip64 format. When an entry uses Zip64, then ZipInputStream.readEnd
should parse the Data Descriptor using 8-byte numbers instead of the regular 4-bytes.
ZipInputStream.readLOC
is a good decision point for determining whether to expect 4- or 8-byte numbers. This method has full access to the LOC header fields including the extra field where any Zip64 field is located.
ZipInputStream is updated as follows:
- A new boolean internal flag
ZipInputStream.expect64BitDataDescriptor
is added. The purpose of this field is to communicate the number format determined byreadLOC
to thereadEnd
method which is responsible for the actual parsing of the Data Descriptor record. readLOC
is updated to inspect the LOC and setexpect64BitDataDescriptor
to true if the LOC uses the Zip64 format; that is if the compressed and uncompressed size fields are both 0xFFFFFFFF and the extra field contains a valid Zip64 extra field. To reduce changes inreadLOC
, this logic is mostly implemented in the new support methodsexpect64BitDataDescriptor
andisZip64DataDescriptorField
.readEnd
is updated to read 8-byte fields when theexpect64BitDataDescriptor
flag is true.
Specification
The specification is not changed, this is purely an implementation and behavioral change.
- csr of
-
JDK-8303866 Allow ZipInputStream.readEnd to parse small Zip64 ZIP files
- Resolved