Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8087143

Reduce Symbol::_identity_hash to 2 bytes

    XMLWordPrintable

    Details

    • Type: Enhancement
    • Status: Resolved
    • Priority: P4
    • Resolution: Fixed
    • Affects Version/s: 9
    • Fix Version/s: 9
    • Component/s: hotspot
    • Labels:
    • Subcomponent:
    • Resolved In Build:
      b74

      Backports

        Description

        Currently Symbol::_identity_hash is 4 bytes and contains a random number. It's used when the Symbol is used as an index in a hashtable:

          unsigned int Hashtable::compute_hash(Symbol* name) {
            return (unsigned int) name->identity_hash();
          }

          unsigned int TwoOopHashtable::compute_hash(Symbol* name, ClassLoaderData* loader_data) {
            unsigned int name_hash = name->identity_hash();
            // loader is null with CDS
            assert(loader_data != NULL || UseSharedSpaces || DumpSharedSpaces,
                   "only allowed with shared spaces");
            unsigned int loader_hash = loader_data == NULL ? 0 : loader_data->identity_hash();
            return name_hash ^ loader_hash;
          }

        Since our hashtable sizes are smaller than 65535

            const int defaultProtectionDomainCacheSize = NOT_LP64(137) LP64_ONLY(2017);
            const int defaultStringTableSize = NOT_LP64(1009) LP64_ONLY(60013);
            const int minimumStringTableSize = 1009;
            const int defaultSymbolTableSize = 20011;
            const int minimumSymbolTableSize = 1009;

        It seems like a 16-bit random number would do just as well. The bucket index is calculated like this:

            int index = symbol->_identity_hash() % table_size;

        So as long as table_size is 65535 or less, the top 16 bits of symbol->_identity_hash() do not affect the distribution of the key-to-bucket mapping. if we set the top 16 bits to zero, the only effect is the index will be uniformly shifted by a constant. So, the table will not become any more imbalanced than they are today.

        If we want extra random-ness (in case the user sets StringTableSize and SymbolTableSize to larger than 65535, maybe we can use the first two bytes of the Symbol, and the lowest 16 bits of the symbol pointer, like:

        BEFORE:

            class Symbol {
              int _identity_hash;
              byte _data[];
              int identity_hash() { return _identity_hash; }
            }

        AFTER

            class Symbol {
              short _identity_hash;
              byte _data[];
              int identity_hash() {
                 int lower = _identity_hash;
                 int upper1 = int(this) / ObjectAlignmentInBytes;
                 int upper2 = _data[0] ^ data[1];
                 int upper = upper1 ^ upper2;

                 return upper << 16 + lower;
              }
            }

        This is probably just as good as a random 32-bit number, for our purposes (hashtables are much smaller than 4G in size).

        On Linux/x64, the default CDS image is reduced by 0.38% with this change:

        before 18369589 bytes total
        after 18300149 bytes total
        reduction = 69440 bytes = 0.38%

        number of symbols = 50917, average reduction = 1.36 bytes per symbol.

        Note that the average reduction is less than 2 bytes, because many symbols are shorter than the minimal MetaspaceObj allocation size (24 bytes on 64 bit).

        See attached symbol_patch.txt for a proof-of-concept. Note that the patch does not include the identity_hash() changes as proposed above.

        We should validate that this change does not make the hashtables more imbalanced than they are today. This can be done by running a program with a large number of classes (preferably 60,000 classes or more) in fastdebug build with PrintSystemDictionaryAtExit. Also, modify the PrintSystemDictionaryAtExit code to print out the (mean, stddev and max) values of the system dictionary's bucket depth (see RehashableHashtable<T, F>::dump_table in hashtable.cpp for the use of NumberSeq).

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                minqi Yumin Qi (Inactive)
                Reporter:
                iklam Ioi Lam
                Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved: