Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-5022385

(str) String.hashCode() performance improvement

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 5.0
    • core-libs
    • x86
    • windows_2000, windows_xp

      Name: gm110360 Date: 03/29/2004


      A DESCRIPTION OF THE REQUEST :
      Attached is a program that demonstrates a suggested small code change that speeds up the String.hashCode method by as much as 20% for some cases.

      The program shows a few minor tweaks to the "String.hashCode" method that improve performance while retaining interface contract. The improvement is significant enough that you might want to consider replacing the current algorithm.

      In a nutshell:
        - For non-empty strings
          - Hash calculation is 6% to 20% faster, depending on string length.
          - Cached hash retrieval is about the same.
        - For empty strings
          - Hash calculation is 28% faster.
          - Cached hash retrieval is 34% faster.

      The main benefactors or this improvement are programs that create hash tables containing a large amount of non-literal string data, such as compilers and data-mining programs, which could see a 10% improvement in their table-building phases.

      These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but were a bit slower all-around. Also, results were about the same under Linux.

      Performance test results:

        Hash computation (milliseconds for 10M trials):

          Trial #1:
          empty old: 640 (0) New is 27.7% faster
          empty new: 501 (0)
          1-char old: 962 (97) New is 5.6% faster
          1-char new: 911 (97)
          10-char old: 3595 (-634317659) New is 13.2% faster
          10-char new: 3175 (-634317659)
          100-char old: 28040 (-1399078606) New is 15.6% faster
          100-char new: 24255 (-1399078606)

          Trial #2:
          0 old: 641 (0) New is 28.2% faster
          0 new: 500 (0)
          1 old: 962 (97) New is 6.8% faster
          1 new: 901 (97)
          10 old: 3595 (-634317659) New is 13.6% faster
          10 new: 3165 (-634317659)
          100 old: 28050 (-1399078606) New is 18.8% faster
          100 new: 23604 (-1399078606)
          1000 old: 272481 (1365024500) New is 19.9% faster
          1000 new: 227297 (1365024500)

        Hash retrieval (milliseconds for 100M trials):
          (Shows that empty strings retrieve hash faster, others about the same.)

          empty old: 6590 (0) New is 34.2% faster
          empty new: 4907 (0)
          1-char old: 5838 (97) < 1% difference
          1-char new: 5819 (97)
          10-char old: 5808 (-634317659) < 1% difference
          10-char new: 5818 (-634317659)
          100-char old: 5819 (-1399078606) < 1% difference
          100-char new: 5818 (-1399078606)
          1000-char old: 5818 (1365024500) < 1% difference
          1000-char new: 5809 (1365024500)


      JUSTIFICATION :
      Faster is better.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      A speedup in some programs.

      ---------- BEGIN SOURCE ----------
      /*

      StingHashAlgorithm.java

      This program shows a few minor tweaks to the "String.hashCode" method that
      improve performance while retaining interface contract. The improvement is
      significant enough that you might want to consider replacing the current
      algorithm.

      In a nutshell:
        - For non-empty strings
          - Hash calculation is 6% to 20% faster, depending on string length.
          - Cached hash retrieval is about the same.
        - For empty strings
          - Hash calculation is 28% faster.
          - Cached hash retrieval is 34% faster.

      The main benefactors or this improvement are programs that create hash
      tables containing a large amount of non-literal string data, such as
      compilers and data-mining programs, which could see a 10% improvement in
      their table-building phases.

      These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using
      J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but
      were a bit slower all-around. Also, results were about the same under Linux.

      Performance test results:

        Hash computation (milliseconds for 10M trials):

          Trial #1:
          empty old: 640 (0) New is 27.7% faster
          empty new: 501 (0)
          1-char old: 962 (97) New is 5.6% faster
          1-char new: 911 (97)
          10-char old: 3595 (-634317659) New is 13.2% faster
          10-char new: 3175 (-634317659)
          100-char old: 28040 (-1399078606) New is 15.6% faster
          100-char new: 24255 (-1399078606)

          Trial #2:
          0 old: 641 (0) New is 28.2% faster
          0 new: 500 (0)
          1 old: 962 (97) New is 6.8% faster
          1 new: 901 (97)
          10 old: 3595 (-634317659) New is 13.6% faster
          10 new: 3165 (-634317659)
          100 old: 28050 (-1399078606) New is 18.8% faster
          100 new: 23604 (-1399078606)
          1000 old: 272481 (1365024500) New is 19.9% faster
          1000 new: 227297 (1365024500)

        Hash retrieval (milliseconds for 100M trials):
          (Shows that empty strings retrieve hash faster, others about the same.)

          empty old: 6590 (0) New is 34.2% faster
          empty new: 4907 (0)
          1-char old: 5838 (97) < 1% difference
          1-char new: 5819 (97)
          10-char old: 5808 (-634317659) < 1% difference
          10-char new: 5818 (-634317659)
          100-char old: 5819 (-1399078606) < 1% difference
          100-char new: 5818 (-1399078606)
          1000-char old: 5818 (1365024500) < 1% difference
          1000-char new: 5809 (1365024500)

      */

      public class StringHashAlgorithm {

          //
          // Test parameters: Set to true to test hash computation, false to
          // test hash retrieval (cached). The "production" setting is false.
          //
          private static final boolean ALWAYS_COMPUTE = true; // Normal is false

          private static final int reps = ALWAYS_COMPUTE ? 10000000 : 100000000;

          private interface QuasiString {
      int hashCode();
          }

          private static final class NewQuasiString implements QuasiString {

      private int hash = 0;
      private int count;
      private int offset = 0;
      private char[] value;
      public NewQuasiString(String s) {
      value = s.toCharArray();
      count = value.length;
      }
      //
      // Proposed new hashCode technique. Parameter for the "real", non-test
      // version is ALWAYS_COMPUTE = false.
      //
      public int hashCode() {
      int h = hash;
      if (h != 0)
      return h;
      int len = count;
      if (len == 0)
      return 0;
      int off = offset;
      char val[] = value;

      len += off;
      while (off < len)
      h = 31 * h + val[off++];
      if (ALWAYS_COMPUTE) {
      return h = h;
      } else {
      return hash = h;
      }
      }
          }

          private static final class OldQuasiString implements QuasiString {
      private int hash = 0;
      private int count;
      private int offset = 0;
      private char[] value;
      public OldQuasiString(String s) {
      value = s.toCharArray();
      count = value.length;
      }
      //
      // Current hashCode technique.
      //
      public int hashCode() {
      int h = hash;
      if (h == 0) {
      int off = offset;
      char val[] = value;
      int len = count;

      for (int i = 0; i < len; i++) {
      h = 31*h + val[off++];
      }
      if (ALWAYS_COMPUTE)
      h = h;
      else
      hash = h;
      }
      return h;
      }
          }

          public static void main(String[] args) {
      String chars10 = "abcdefghij";
      String chars100 = chars10 + chars10 + chars10 + chars10 +
      chars10 + chars10 + chars10 + chars10 + chars10 + chars10;
      String chars1000 = chars100 + chars100 + chars100 + chars100 +
      chars100 + chars100 + chars100 + chars100 + chars100 + chars100;
      test("0 old", new OldQuasiString(""));
      test("0 new", new NewQuasiString(""));
      test("1 old", new OldQuasiString("a"));
      test("1 new", new NewQuasiString("a"));
      test("10 old", new OldQuasiString(chars10));
      test("10 new", new NewQuasiString(chars10));
      test("100 old", new OldQuasiString(chars100));
      test("100 new", new NewQuasiString(chars100));
      test("1000 old", new OldQuasiString(chars1000));
      test("1000 new", new NewQuasiString(chars1000));
          }

          private static void test(String msg, QuasiString s) {
      long t0 = System.currentTimeMillis();
      for (int i = reps; --i >= 0; ) {
      s.hashCode();
      }
      long t = System.currentTimeMillis() - t0;
      System.out.println(msg + ": " + t + " (" + s.hashCode() + ")");
          }

      }
      ---------- END SOURCE ----------
      (Incident Review ID: 245459)
      ======================================================================

            Unassigned Unassigned
            gmanwanisunw Girish Manwani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: