Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8005466

JAR file entry hash table uses too much memory (zip_util.c)

    XMLWordPrintable

Details

    • b75
    • generic
    • generic
    • Not verified

    Backports

      Description

        See
        jdk/src/share/native/java/util/zip/zip_util.c
        jdk/src/share/native/java/util/zip/zip_util.h

        This data structure is created once for each entry in each JAR file loaded by the JVM at runtime:
        typedef struct jzcell {
            unsigned int hash; /* 32 bit hashcode on name */
            jlong cenpos; /* Offset of central directory file header */
            unsigned int next; /* hash chain: index into jzfile->entries */
        } jzcell;

        This takes 16 bytes on 32-bit VM. On 64-bit VM, due to inefficient structure alignment, this takes 24 bytes.

        rt.jar on JDK8 has about 18000 entries, so the size of the entries hash table (stored in jzfile::entries) is about 280KB on 32-bit VM and 420KB on 64-bit VM. This table is loaded in memory as long as the JAR file is in use. In the case of rt.jar, this table is never deallocated.

        While the 64-bit usage can be easily reduced to the same as 32-bit (by rearranging the fields in the jzcell stucture), we can further reduce the size of jzcell:

        typedef struct jzcellsmall {
            unsigned short hash; /* (truncated) 16 bit hashcode on name */
            jshort next; /* hash chain: index into jzfile->entries */
            jint cenpos; /* Offset of central directory file header */
        } jzcellsmall;

        This can reduce the memory usage to 8 bytes per JAR entry (for both 32-bit and 64-bit VMs). We can use this form as long as the JAR file is less than 2^30 bytes in size and has fewer than 32768 entries. This applies to rt.jar in all versions of JDK (about 18000 entries, 65MB size in JDK8).

        Note that truncating the stored hash value from 32-bit to 16-bit introduces no extra collision in the case of rt.jar in JDK8. I.e., for all entries pairs A and B, where A and B belong to the same bucket, the lower 16 bits of the hash values of A and B are not equal. Therefore, using jzcellsmall will introduce no extra I/O access.

        Savings on 64-bit VM (patch compared to jdk1.8.0_ea_b68):

        HelloWorld:
        Before: 475564 bytes
        After: 164618 bytes
        Reduction: 310946 bytes

        Eclipse IDE
        Before: 1693284 bytes
        After: 586946 bytes
        Reduction: 1106338 bytes


        Attachments

          Issue Links

            Activity

              People

                iklam Ioi Lam
                iklam Ioi Lam
                Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: