Loading...

Type: Enhancement
Resolution: Duplicate
Priority: P4
Fix Version/s: None
Affects Version/s: 5.0
Component/s: core-libs
Labels:
- iris
- madbot
- webbug

Subcomponent:
java.lang
CPU:

x86
OS:

windows_2000, windows_xp

Name: gm110360 Date: 03/29/2004

A DESCRIPTION OF THE REQUEST :
Attached is a program that demonstrates a suggested small code change that speeds up the String.hashCode method by as much as 20% for some cases.

The program shows a few minor tweaks to the "String.hashCode" method that improve performance while retaining interface contract. The improvement is significant enough that you might want to consider replacing the current algorithm.

In a nutshell:
  - For non-empty strings
    - Hash calculation is 6% to 20% faster, depending on string length.
    - Cached hash retrieval is about the same.
  - For empty strings
    - Hash calculation is 28% faster.
    - Cached hash retrieval is 34% faster.

The main benefactors or this improvement are programs that create hash tables containing a large amount of non-literal string data, such as compilers and data-mining programs, which could see a 10% improvement in their table-building phases.

These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but were a bit slower all-around. Also, results were about the same under Linux.

Performance test results:

  Hash computation (milliseconds for 10M trials):

    Trial #1:
    empty old: 640 (0) New is 27.7% faster
    empty new: 501 (0)
    1-char old: 962 (97) New is 5.6% faster
    1-char new: 911 (97)
    10-char old: 3595 (-634317659) New is 13.2% faster
    10-char new: 3175 (-634317659)
    100-char old: 28040 (-1399078606) New is 15.6% faster
    100-char new: 24255 (-1399078606)

    Trial #2:
    0 old: 641 (0) New is 28.2% faster
    0 new: 500 (0)
    1 old: 962 (97) New is 6.8% faster
    1 new: 901 (97)
    10 old: 3595 (-634317659) New is 13.6% faster
    10 new: 3165 (-634317659)
    100 old: 28050 (-1399078606) New is 18.8% faster
    100 new: 23604 (-1399078606)
    1000 old: 272481 (1365024500) New is 19.9% faster
    1000 new: 227297 (1365024500)

  Hash retrieval (milliseconds for 100M trials):
    (Shows that empty strings retrieve hash faster, others about the same.)

    empty old: 6590 (0) New is 34.2% faster
    empty new: 4907 (0)
    1-char old: 5838 (97) < 1% difference
    1-char new: 5819 (97)
    10-char old: 5808 (-634317659) < 1% difference
    10-char new: 5818 (-634317659)
    100-char old: 5819 (-1399078606) < 1% difference
    100-char new: 5818 (-1399078606)
    1000-char old: 5818 (1365024500) < 1% difference
    1000-char new: 5809 (1365024500)

JUSTIFICATION :
Faster is better.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
A speedup in some programs.

---------- BEGIN SOURCE ----------
/*

StingHashAlgorithm.java

This program shows a few minor tweaks to the "String.hashCode" method that
improve performance while retaining interface contract. The improvement is
significant enough that you might want to consider replacing the current
algorithm.

In a nutshell:
  - For non-empty strings
    - Hash calculation is 6% to 20% faster, depending on string length.
    - Cached hash retrieval is about the same.
  - For empty strings
    - Hash calculation is 28% faster.
    - Cached hash retrieval is 34% faster.

The main benefactors or this improvement are programs that create hash
tables containing a large amount of non-literal string data, such as
compilers and data-mining programs, which could see a 10% improvement in
their table-building phases.

These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using
J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but
were a bit slower all-around. Also, results were about the same under Linux.

Performance test results:

  Hash computation (milliseconds for 10M trials):

    Trial #1:
    empty old: 640 (0) New is 27.7% faster
    empty new: 501 (0)
    1-char old: 962 (97) New is 5.6% faster
    1-char new: 911 (97)
    10-char old: 3595 (-634317659) New is 13.2% faster
    10-char new: 3175 (-634317659)
    100-char old: 28040 (-1399078606) New is 15.6% faster
    100-char new: 24255 (-1399078606)

    Trial #2:
    0 old: 641 (0) New is 28.2% faster
    0 new: 500 (0)
    1 old: 962 (97) New is 6.8% faster
    1 new: 901 (97)
    10 old: 3595 (-634317659) New is 13.6% faster
    10 new: 3165 (-634317659)
    100 old: 28050 (-1399078606) New is 18.8% faster
    100 new: 23604 (-1399078606)
    1000 old: 272481 (1365024500) New is 19.9% faster
    1000 new: 227297 (1365024500)

  Hash retrieval (milliseconds for 100M trials):
    (Shows that empty strings retrieve hash faster, others about the same.)

    empty old: 6590 (0) New is 34.2% faster
    empty new: 4907 (0)
    1-char old: 5838 (97) < 1% difference
    1-char new: 5819 (97)
    10-char old: 5808 (-634317659) < 1% difference
    10-char new: 5818 (-634317659)
    100-char old: 5819 (-1399078606) < 1% difference
    100-char new: 5818 (-1399078606)
    1000-char old: 5818 (1365024500) < 1% difference
    1000-char new: 5809 (1365024500)

*/

public class StringHashAlgorithm {

    //
    // Test parameters: Set to true to test hash computation, false to
    // test hash retrieval (cached). The "production" setting is false.
    //
    private static final boolean ALWAYS_COMPUTE = true; // Normal is false

    private static final int reps = ALWAYS_COMPUTE ? 10000000 : 100000000;

    private interface QuasiString {
int hashCode();
    }

    private static final class NewQuasiString implements QuasiString {

private int hash = 0;
private int count;
private int offset = 0;
private char[] value;
public NewQuasiString(String s) {
value = s.toCharArray();
count = value.length;
}
//
// Proposed new hashCode technique. Parameter for the "real", non-test
// version is ALWAYS_COMPUTE = false.
//
public int hashCode() {
int h = hash;
if (h != 0)
return h;
int len = count;
if (len == 0)
return 0;
int off = offset;
char val[] = value;

len += off;
while (off < len)
h = 31 * h + val[off++];
if (ALWAYS_COMPUTE) {
return h = h;
} else {
return hash = h;
}
}
    }

    private static final class OldQuasiString implements QuasiString {
private int hash = 0;
private int count;
private int offset = 0;
private char[] value;
public OldQuasiString(String s) {
value = s.toCharArray();
count = value.length;
}
//
// Current hashCode technique.
//
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;

for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
if (ALWAYS_COMPUTE)
h = h;
else
hash = h;
}
return h;
}
    }

    public static void main(String[] args) {
String chars10 = "abcdefghij";
String chars100 = chars10 + chars10 + chars10 + chars10 +
chars10 + chars10 + chars10 + chars10 + chars10 + chars10;
String chars1000 = chars100 + chars100 + chars100 + chars100 +
chars100 + chars100 + chars100 + chars100 + chars100 + chars100;
test("0 old", new OldQuasiString(""));
test("0 new", new NewQuasiString(""));
test("1 old", new OldQuasiString("a"));
test("1 new", new NewQuasiString("a"));
test("10 old", new OldQuasiString(chars10));
test("10 new", new NewQuasiString(chars10));
test("100 old", new OldQuasiString(chars100));
test("100 new", new NewQuasiString(chars100));
test("1000 old", new OldQuasiString(chars1000));
test("1000 new", new NewQuasiString(chars1000));
    }

    private static void test(String msg, QuasiString s) {
long t0 = System.currentTimeMillis();
for (int i = reps; --i >= 0; ) {
s.hashCode();
}
long t = System.currentTimeMillis() - t0;
System.out.println(msg + ": " + t + " (" + s.hashCode() + ")");
    }

}
---------- END SOURCE ----------
(Incident Review ID: 245459)
======================================================================

duplicates

JDK-6932858 (str) Tune hashCode() of String

Open

Details

Description

Attachments

Issue Links

Activity

People

Dates