Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8319122

Improve documentation of various Zip-file related APIs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P3 P3
    • None
    • 11, 17, 21
    • core-libs

      I'm opening this issue on behalf of Yakov Shafranovich (yakovsh@amazon.com):

      The various Zip/Jar-file related Java APIs have some long-standing differences or peculiarities with respect to the ZIP-file specification or compared to other implementations which should be documented in the API-doc.

      ```
      diff --git a/src/java.base/share/classes/java/net/JarURLConnection.java b/src/java.base/share/classes/java/net/JarURLConnection.java
      index 2c2734b08d7..d60940f46d9 100644
      --- a/src/java.base/share/classes/java/net/JarURLConnection.java
      +++ b/src/java.base/share/classes/java/net/JarURLConnection.java
      @@ -123,6 +123,11 @@
        *
        * </ul>
        *
      + * @apiNote
      + * JAR files retrieved by this class might get cached for performance reasons which can result
      + * in unexpected behavior if the JAR files are modified while being read using this class.
      + * If such behavior is undesirable, please use the {@link URLConnection#setUseCaches(boolean)} method to disable caching.
      + *
        * @see java.net.URL
        * @see java.net.URLConnection
        *
      diff --git a/src/java.base/share/classes/java/util/jar/JarFile.java b/src/java.base/share/classes/java/util/jar/JarFile.java
      index ca8c726129e..e914dc27c10 100644
      --- a/src/java.base/share/classes/java/util/jar/JarFile.java
      +++ b/src/java.base/share/classes/java/util/jar/JarFile.java
      @@ -133,6 +133,9 @@
        * </ul>
        * </div>
        *
      + * This class uses a cache for ZIP entry metadata, and doesn't handle duplicate entries,
      + * which can result in unpredictable behavior or crashes (@see java.util.zip.ZipFile).
      + *
        * @author David Connelly
        * @see Manifest
        * @see java.util.zip.ZipFile
      diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java b/src/java.base/share/classes/java/util/zip/ZipFile.java
      index bbcd3cdd712..6334e45c063 100644
      --- a/src/java.base/share/classes/java/util/zip/ZipFile.java
      +++ b/src/java.base/share/classes/java/util/zip/ZipFile.java
      @@ -90,6 +90,21 @@
        * cleanup mechanisms such as {@link java.lang.ref.Cleaner} and remove the overriding
        * {@code finalize} method.
        *
      + * @implNote
      + * This class uses a cache for ZIP entry metadata (but not content) keyed off pathname,
      + * last modified time and file key. If a ZIP file is modified while being read with this class,
      + * it can result in unpredictable behavior or crashes.
      + *
      + * Furthermore, while the <a href="https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT">PKWARE ZIP File Format Specification</a>
      + * has no restrictions on ZIP entries with duplicate names, the {@link ZipOutputStream} class doesn't allow creation of
      + * ZIP archives with duplicate entry names. Therefore, when parsing archives containing duplicate names, unexpected behavior
      + * may occur such as metadata from the later entry or content from the first entry
      + * being returned due to internal caching.
      + *
      + * Additionally, for compatibility with earlier versions of the JDK, files and directories with the same name (such as "foobar" and "foobar/")
      + * are considered duplicates, which can result in unexpected behavior such as wrong metadata or content being returned
      + * when parsing archives containing duplicate names.
      + *
        * @author David Connelly
        * @since 1.1
        */
      diff --git a/src/java.base/share/classes/java/util/zip/ZipInputStream.java b/src/java.base/share/classes/java/util/zip/ZipInputStream.java
      index 9e265fd668e..7f9310ef3f7 100644
      --- a/src/java.base/share/classes/java/util/zip/ZipInputStream.java
      +++ b/src/java.base/share/classes/java/util/zip/ZipInputStream.java
      @@ -66,9 +66,13 @@
        * @apiNote
        * The LOC header contains metadata about the Zip file entry. {@code ZipInputStream}
        * does not read the Central directory (CEN) header for the entry and therefore
      - * will not have access to its metadata such as the external file attributes.
      - * {@linkplain ZipFile} may be used when the information stored within
      - * the CEN header is required.
      + * will not have access to its metadata such as the external file attributes. Additionally,
      + * {@code ZipInputStream} might read entries that are not in the Central directory or contain
      + * information that is different than in the Central directory (CEN) header for the same entry.
      + * This class might also fail to properly parse ZIP archives that have prepended data.
      + *
      + * Whenever possible, {@linkplain ZipFile} should be used for parsing ZIP archives
      + * since it correctly reads data from the central directory.
        *
        * @author David Connelly
        * @since 1.1
      diff --git a/src/jdk.zipfs/share/classes/module-info.java b/src/jdk.zipfs/share/classes/module-info.java
      index b996006b4fe..63d2c31a5d8 100644
      --- a/src/jdk.zipfs/share/classes/module-info.java
      +++ b/src/jdk.zipfs/share/classes/module-info.java
      @@ -293,6 +293,17 @@
        * .forEach(System.out::println);
        * }
        * </pre>
      + *
      + * @implNote
      + * While the <a href="https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT">PKWARE ZIP File Format Specification</a>
      + * has no restrictions on ZIP entries with duplicate names, the {@link ZipOutputStream} class doesn't allow creation of
      + * ZIP archives with duplicate entry names. Therefore, when parsing archives containing duplicate names, unexpected behavior
      + * may occur such as metadata or content from the later entry being returned due to internal caching.
      + *
      + * Additionally, for compatibility with earlier versions of the JDK, files and directories with the same name (such as "foobar" and "foobar/")
      + * are considered duplicates, which can result in unexpected behavior such as wrong metadata or content being returned
      + * when parsing archives containing duplicate names.
      + *
        * @provides java.nio.file.spi.FileSystemProvider
        * @moduleGraph
        * @since 9
      ```

            simonis Volker Simonis
            simonis Volker Simonis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: