-
Bug
-
Resolution: Unresolved
-
P3
-
None
-
11, 17, 21
-
Fix Understood
I'm opening this issue on behalf of Yakov Shafranovich (yakovsh@amazon.com):
The various Zip/Jar-file related Java APIs have some long-standing differences or peculiarities with respect to the ZIP-file specification or compared to other implementations which should be documented in the API-doc.
```
diff --git a/src/java.base/share/classes/java/net/JarURLConnection.java b/src/java.base/share/classes/java/net/JarURLConnection.java
index 2c2734b08d7..d60940f46d9 100644
--- a/src/java.base/share/classes/java/net/JarURLConnection.java
+++ b/src/java.base/share/classes/java/net/JarURLConnection.java
@@ -123,6 +123,11 @@
*
* </ul>
*
+ * @apiNote
+ * JAR files retrieved by this class might get cached for performance reasons which can result
+ * in unexpected behavior if the JAR files are modified while being read using this class.
+ * If such behavior is undesirable, please use the {@link URLConnection#setUseCaches(boolean)} method to disable caching.
+ *
* @see java.net.URL
* @see java.net.URLConnection
*
diff --git a/src/java.base/share/classes/java/util/jar/JarFile.java b/src/java.base/share/classes/java/util/jar/JarFile.java
index ca8c726129e..e914dc27c10 100644
--- a/src/java.base/share/classes/java/util/jar/JarFile.java
+++ b/src/java.base/share/classes/java/util/jar/JarFile.java
@@ -133,6 +133,9 @@
* </ul>
* </div>
*
+ * This class uses a cache for ZIP entry metadata, and doesn't handle duplicate entries,
+ * which can result in unpredictable behavior or crashes (@see java.util.zip.ZipFile).
+ *
* @author David Connelly
* @see Manifest
* @see java.util.zip.ZipFile
diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java b/src/java.base/share/classes/java/util/zip/ZipFile.java
index bbcd3cdd712..6334e45c063 100644
--- a/src/java.base/share/classes/java/util/zip/ZipFile.java
+++ b/src/java.base/share/classes/java/util/zip/ZipFile.java
@@ -90,6 +90,21 @@
* cleanup mechanisms such as {@link java.lang.ref.Cleaner} and remove the overriding
* {@code finalize} method.
*
+ * @implNote
+ * This class uses a cache for ZIP entry metadata (but not content) keyed off pathname,
+ * last modified time and file key. If a ZIP file is modified while being read with this class,
+ * it can result in unpredictable behavior or crashes.
+ *
+ * Furthermore, while the <a href="https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT">PKWARE ZIP File Format Specification</a>
+ * has no restrictions on ZIP entries with duplicate names, the {@link ZipOutputStream} class doesn't allow creation of
+ * ZIP archives with duplicate entry names. Therefore, when parsing archives containing duplicate names, unexpected behavior
+ * may occur such as metadata from the later entry or content from the first entry
+ * being returned due to internal caching.
+ *
+ * Additionally, for compatibility with earlier versions of the JDK, files and directories with the same name (such as "foobar" and "foobar/")
+ * are considered duplicates, which can result in unexpected behavior such as wrong metadata or content being returned
+ * when parsing archives containing duplicate names.
+ *
* @author David Connelly
* @since 1.1
*/
diff --git a/src/java.base/share/classes/java/util/zip/ZipInputStream.java b/src/java.base/share/classes/java/util/zip/ZipInputStream.java
index 9e265fd668e..7f9310ef3f7 100644
--- a/src/java.base/share/classes/java/util/zip/ZipInputStream.java
+++ b/src/java.base/share/classes/java/util/zip/ZipInputStream.java
@@ -66,9 +66,13 @@
* @apiNote
* The LOC header contains metadata about the Zip file entry. {@code ZipInputStream}
* does not read the Central directory (CEN) header for the entry and therefore
- * will not have access to its metadata such as the external file attributes.
- * {@linkplain ZipFile} may be used when the information stored within
- * the CEN header is required.
+ * will not have access to its metadata such as the external file attributes. Additionally,
+ * {@code ZipInputStream} might read entries that are not in the Central directory or contain
+ * information that is different than in the Central directory (CEN) header for the same entry.
+ * This class might also fail to properly parse ZIP archives that have prepended data.
+ *
+ * Whenever possible, {@linkplain ZipFile} should be used for parsing ZIP archives
+ * since it correctly reads data from the central directory.
*
* @author David Connelly
* @since 1.1
diff --git a/src/jdk.zipfs/share/classes/module-info.java b/src/jdk.zipfs/share/classes/module-info.java
index b996006b4fe..63d2c31a5d8 100644
--- a/src/jdk.zipfs/share/classes/module-info.java
+++ b/src/jdk.zipfs/share/classes/module-info.java
@@ -293,6 +293,17 @@
* .forEach(System.out::println);
* }
* </pre>
+ *
+ * @implNote
+ * While the <a href="https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT">PKWARE ZIP File Format Specification</a>
+ * has no restrictions on ZIP entries with duplicate names, the {@link ZipOutputStream} class doesn't allow creation of
+ * ZIP archives with duplicate entry names. Therefore, when parsing archives containing duplicate names, unexpected behavior
+ * may occur such as metadata or content from the later entry being returned due to internal caching.
+ *
+ * Additionally, for compatibility with earlier versions of the JDK, files and directories with the same name (such as "foobar" and "foobar/")
+ * are considered duplicates, which can result in unexpected behavior such as wrong metadata or content being returned
+ * when parsing archives containing duplicate names.
+ *
* @provides java.nio.file.spi.FileSystemProvider
* @moduleGraph
* @since 9
```
The various Zip/Jar-file related Java APIs have some long-standing differences or peculiarities with respect to the ZIP-file specification or compared to other implementations which should be documented in the API-doc.
```
diff --git a/src/java.base/share/classes/java/net/JarURLConnection.java b/src/java.base/share/classes/java/net/JarURLConnection.java
index 2c2734b08d7..d60940f46d9 100644
--- a/src/java.base/share/classes/java/net/JarURLConnection.java
+++ b/src/java.base/share/classes/java/net/JarURLConnection.java
@@ -123,6 +123,11 @@
*
* </ul>
*
+ * @apiNote
+ * JAR files retrieved by this class might get cached for performance reasons which can result
+ * in unexpected behavior if the JAR files are modified while being read using this class.
+ * If such behavior is undesirable, please use the {@link URLConnection#setUseCaches(boolean)} method to disable caching.
+ *
* @see java.net.URL
* @see java.net.URLConnection
*
diff --git a/src/java.base/share/classes/java/util/jar/JarFile.java b/src/java.base/share/classes/java/util/jar/JarFile.java
index ca8c726129e..e914dc27c10 100644
--- a/src/java.base/share/classes/java/util/jar/JarFile.java
+++ b/src/java.base/share/classes/java/util/jar/JarFile.java
@@ -133,6 +133,9 @@
* </ul>
* </div>
*
+ * This class uses a cache for ZIP entry metadata, and doesn't handle duplicate entries,
+ * which can result in unpredictable behavior or crashes (@see java.util.zip.ZipFile).
+ *
* @author David Connelly
* @see Manifest
* @see java.util.zip.ZipFile
diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java b/src/java.base/share/classes/java/util/zip/ZipFile.java
index bbcd3cdd712..6334e45c063 100644
--- a/src/java.base/share/classes/java/util/zip/ZipFile.java
+++ b/src/java.base/share/classes/java/util/zip/ZipFile.java
@@ -90,6 +90,21 @@
* cleanup mechanisms such as {@link java.lang.ref.Cleaner} and remove the overriding
* {@code finalize} method.
*
+ * @implNote
+ * This class uses a cache for ZIP entry metadata (but not content) keyed off pathname,
+ * last modified time and file key. If a ZIP file is modified while being read with this class,
+ * it can result in unpredictable behavior or crashes.
+ *
+ * Furthermore, while the <a href="https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT">PKWARE ZIP File Format Specification</a>
+ * has no restrictions on ZIP entries with duplicate names, the {@link ZipOutputStream} class doesn't allow creation of
+ * ZIP archives with duplicate entry names. Therefore, when parsing archives containing duplicate names, unexpected behavior
+ * may occur such as metadata from the later entry or content from the first entry
+ * being returned due to internal caching.
+ *
+ * Additionally, for compatibility with earlier versions of the JDK, files and directories with the same name (such as "foobar" and "foobar/")
+ * are considered duplicates, which can result in unexpected behavior such as wrong metadata or content being returned
+ * when parsing archives containing duplicate names.
+ *
* @author David Connelly
* @since 1.1
*/
diff --git a/src/java.base/share/classes/java/util/zip/ZipInputStream.java b/src/java.base/share/classes/java/util/zip/ZipInputStream.java
index 9e265fd668e..7f9310ef3f7 100644
--- a/src/java.base/share/classes/java/util/zip/ZipInputStream.java
+++ b/src/java.base/share/classes/java/util/zip/ZipInputStream.java
@@ -66,9 +66,13 @@
* @apiNote
* The LOC header contains metadata about the Zip file entry. {@code ZipInputStream}
* does not read the Central directory (CEN) header for the entry and therefore
- * will not have access to its metadata such as the external file attributes.
- * {@linkplain ZipFile} may be used when the information stored within
- * the CEN header is required.
+ * will not have access to its metadata such as the external file attributes. Additionally,
+ * {@code ZipInputStream} might read entries that are not in the Central directory or contain
+ * information that is different than in the Central directory (CEN) header for the same entry.
+ * This class might also fail to properly parse ZIP archives that have prepended data.
+ *
+ * Whenever possible, {@linkplain ZipFile} should be used for parsing ZIP archives
+ * since it correctly reads data from the central directory.
*
* @author David Connelly
* @since 1.1
diff --git a/src/jdk.zipfs/share/classes/module-info.java b/src/jdk.zipfs/share/classes/module-info.java
index b996006b4fe..63d2c31a5d8 100644
--- a/src/jdk.zipfs/share/classes/module-info.java
+++ b/src/jdk.zipfs/share/classes/module-info.java
@@ -293,6 +293,17 @@
* .forEach(System.out::println);
* }
* </pre>
+ *
+ * @implNote
+ * While the <a href="https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT">PKWARE ZIP File Format Specification</a>
+ * has no restrictions on ZIP entries with duplicate names, the {@link ZipOutputStream} class doesn't allow creation of
+ * ZIP archives with duplicate entry names. Therefore, when parsing archives containing duplicate names, unexpected behavior
+ * may occur such as metadata or content from the later entry being returned due to internal caching.
+ *
+ * Additionally, for compatibility with earlier versions of the JDK, files and directories with the same name (such as "foobar" and "foobar/")
+ * are considered duplicates, which can result in unexpected behavior such as wrong metadata or content being returned
+ * when parsing archives containing duplicate names.
+ *
* @provides java.nio.file.spi.FileSystemProvider
* @moduleGraph
* @since 9
```
- relates to
-
JDK-8278165 Clarify that ZipInputStream does not access the CEN fields for a ZipEntry
-
- Resolved
-
- links to
-
Review openjdk/jdk/16424