Name: gm110360 Date: 03/29/2004
A DESCRIPTION OF THE REQUEST :
Attached is a program that demonstrates a suggested small code change that speeds up the String.hashCode method by as much as 20% for some cases.
The program shows a few minor tweaks to the "String.hashCode" method that improve performance while retaining interface contract. The improvement is significant enough that you might want to consider replacing the current algorithm.
In a nutshell:
- For non-empty strings
- Hash calculation is 6% to 20% faster, depending on string length.
- Cached hash retrieval is about the same.
- For empty strings
- Hash calculation is 28% faster.
- Cached hash retrieval is 34% faster.
The main benefactors or this improvement are programs that create hash tables containing a large amount of non-literal string data, such as compilers and data-mining programs, which could see a 10% improvement in their table-building phases.
These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but were a bit slower all-around. Also, results were about the same under Linux.
Performance test results:
Hash computation (milliseconds for 10M trials):
Trial #1:
empty old: 640 (0) New is 27.7% faster
empty new: 501 (0)
1-char old: 962 (97) New is 5.6% faster
1-char new: 911 (97)
10-char old: 3595 (-634317659) New is 13.2% faster
10-char new: 3175 (-634317659)
100-char old: 28040 (-1399078606) New is 15.6% faster
100-char new: 24255 (-1399078606)
Trial #2:
0 old: 641 (0) New is 28.2% faster
0 new: 500 (0)
1 old: 962 (97) New is 6.8% faster
1 new: 901 (97)
10 old: 3595 (-634317659) New is 13.6% faster
10 new: 3165 (-634317659)
100 old: 28050 (-1399078606) New is 18.8% faster
100 new: 23604 (-1399078606)
1000 old: 272481 (1365024500) New is 19.9% faster
1000 new: 227297 (1365024500)
Hash retrieval (milliseconds for 100M trials):
(Shows that empty strings retrieve hash faster, others about the same.)
empty old: 6590 (0) New is 34.2% faster
empty new: 4907 (0)
1-char old: 5838 (97) < 1% difference
1-char new: 5819 (97)
10-char old: 5808 (-634317659) < 1% difference
10-char new: 5818 (-634317659)
100-char old: 5819 (-1399078606) < 1% difference
100-char new: 5818 (-1399078606)
1000-char old: 5818 (1365024500) < 1% difference
1000-char new: 5809 (1365024500)
JUSTIFICATION :
Faster is better.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
A speedup in some programs.
---------- BEGIN SOURCE ----------
/*
StingHashAlgorithm.java
This program shows a few minor tweaks to the "String.hashCode" method that
improve performance while retaining interface contract. The improvement is
significant enough that you might want to consider replacing the current
algorithm.
In a nutshell:
- For non-empty strings
- Hash calculation is 6% to 20% faster, depending on string length.
- Cached hash retrieval is about the same.
- For empty strings
- Hash calculation is 28% faster.
- Cached hash retrieval is 34% faster.
The main benefactors or this improvement are programs that create hash
tables containing a large amount of non-literal string data, such as
compilers and data-mining programs, which could see a 10% improvement in
their table-building phases.
These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using
J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but
were a bit slower all-around. Also, results were about the same under Linux.
Performance test results:
Hash computation (milliseconds for 10M trials):
Trial #1:
empty old: 640 (0) New is 27.7% faster
empty new: 501 (0)
1-char old: 962 (97) New is 5.6% faster
1-char new: 911 (97)
10-char old: 3595 (-634317659) New is 13.2% faster
10-char new: 3175 (-634317659)
100-char old: 28040 (-1399078606) New is 15.6% faster
100-char new: 24255 (-1399078606)
Trial #2:
0 old: 641 (0) New is 28.2% faster
0 new: 500 (0)
1 old: 962 (97) New is 6.8% faster
1 new: 901 (97)
10 old: 3595 (-634317659) New is 13.6% faster
10 new: 3165 (-634317659)
100 old: 28050 (-1399078606) New is 18.8% faster
100 new: 23604 (-1399078606)
1000 old: 272481 (1365024500) New is 19.9% faster
1000 new: 227297 (1365024500)
Hash retrieval (milliseconds for 100M trials):
(Shows that empty strings retrieve hash faster, others about the same.)
empty old: 6590 (0) New is 34.2% faster
empty new: 4907 (0)
1-char old: 5838 (97) < 1% difference
1-char new: 5819 (97)
10-char old: 5808 (-634317659) < 1% difference
10-char new: 5818 (-634317659)
100-char old: 5819 (-1399078606) < 1% difference
100-char new: 5818 (-1399078606)
1000-char old: 5818 (1365024500) < 1% difference
1000-char new: 5809 (1365024500)
*/
public class StringHashAlgorithm {
//
// Test parameters: Set to true to test hash computation, false to
// test hash retrieval (cached). The "production" setting is false.
//
private static final boolean ALWAYS_COMPUTE = true; // Normal is false
private static final int reps = ALWAYS_COMPUTE ? 10000000 : 100000000;
private interface QuasiString {
int hashCode();
}
private static final class NewQuasiString implements QuasiString {
private int hash = 0;
private int count;
private int offset = 0;
private char[] value;
public NewQuasiString(String s) {
value = s.toCharArray();
count = value.length;
}
//
// Proposed new hashCode technique. Parameter for the "real", non-test
// version is ALWAYS_COMPUTE = false.
//
public int hashCode() {
int h = hash;
if (h != 0)
return h;
int len = count;
if (len == 0)
return 0;
int off = offset;
char val[] = value;
len += off;
while (off < len)
h = 31 * h + val[off++];
if (ALWAYS_COMPUTE) {
return h = h;
} else {
return hash = h;
}
}
}
private static final class OldQuasiString implements QuasiString {
private int hash = 0;
private int count;
private int offset = 0;
private char[] value;
public OldQuasiString(String s) {
value = s.toCharArray();
count = value.length;
}
//
// Current hashCode technique.
//
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
if (ALWAYS_COMPUTE)
h = h;
else
hash = h;
}
return h;
}
}
public static void main(String[] args) {
String chars10 = "abcdefghij";
String chars100 = chars10 + chars10 + chars10 + chars10 +
chars10 + chars10 + chars10 + chars10 + chars10 + chars10;
String chars1000 = chars100 + chars100 + chars100 + chars100 +
chars100 + chars100 + chars100 + chars100 + chars100 + chars100;
test("0 old", new OldQuasiString(""));
test("0 new", new NewQuasiString(""));
test("1 old", new OldQuasiString("a"));
test("1 new", new NewQuasiString("a"));
test("10 old", new OldQuasiString(chars10));
test("10 new", new NewQuasiString(chars10));
test("100 old", new OldQuasiString(chars100));
test("100 new", new NewQuasiString(chars100));
test("1000 old", new OldQuasiString(chars1000));
test("1000 new", new NewQuasiString(chars1000));
}
private static void test(String msg, QuasiString s) {
long t0 = System.currentTimeMillis();
for (int i = reps; --i >= 0; ) {
s.hashCode();
}
long t = System.currentTimeMillis() - t0;
System.out.println(msg + ": " + t + " (" + s.hashCode() + ")");
}
}
---------- END SOURCE ----------
(Incident Review ID: 245459)
======================================================================
A DESCRIPTION OF THE REQUEST :
Attached is a program that demonstrates a suggested small code change that speeds up the String.hashCode method by as much as 20% for some cases.
The program shows a few minor tweaks to the "String.hashCode" method that improve performance while retaining interface contract. The improvement is significant enough that you might want to consider replacing the current algorithm.
In a nutshell:
- For non-empty strings
- Hash calculation is 6% to 20% faster, depending on string length.
- Cached hash retrieval is about the same.
- For empty strings
- Hash calculation is 28% faster.
- Cached hash retrieval is 34% faster.
The main benefactors or this improvement are programs that create hash tables containing a large amount of non-literal string data, such as compilers and data-mining programs, which could see a 10% improvement in their table-building phases.
These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but were a bit slower all-around. Also, results were about the same under Linux.
Performance test results:
Hash computation (milliseconds for 10M trials):
Trial #1:
empty old: 640 (0) New is 27.7% faster
empty new: 501 (0)
1-char old: 962 (97) New is 5.6% faster
1-char new: 911 (97)
10-char old: 3595 (-634317659) New is 13.2% faster
10-char new: 3175 (-634317659)
100-char old: 28040 (-1399078606) New is 15.6% faster
100-char new: 24255 (-1399078606)
Trial #2:
0 old: 641 (0) New is 28.2% faster
0 new: 500 (0)
1 old: 962 (97) New is 6.8% faster
1 new: 901 (97)
10 old: 3595 (-634317659) New is 13.6% faster
10 new: 3165 (-634317659)
100 old: 28050 (-1399078606) New is 18.8% faster
100 new: 23604 (-1399078606)
1000 old: 272481 (1365024500) New is 19.9% faster
1000 new: 227297 (1365024500)
Hash retrieval (milliseconds for 100M trials):
(Shows that empty strings retrieve hash faster, others about the same.)
empty old: 6590 (0) New is 34.2% faster
empty new: 4907 (0)
1-char old: 5838 (97) < 1% difference
1-char new: 5819 (97)
10-char old: 5808 (-634317659) < 1% difference
10-char new: 5818 (-634317659)
100-char old: 5819 (-1399078606) < 1% difference
100-char new: 5818 (-1399078606)
1000-char old: 5818 (1365024500) < 1% difference
1000-char new: 5809 (1365024500)
JUSTIFICATION :
Faster is better.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
A speedup in some programs.
---------- BEGIN SOURCE ----------
/*
StingHashAlgorithm.java
This program shows a few minor tweaks to the "String.hashCode" method that
improve performance while retaining interface contract. The improvement is
significant enough that you might want to consider replacing the current
algorithm.
In a nutshell:
- For non-empty strings
- Hash calculation is 6% to 20% faster, depending on string length.
- Cached hash retrieval is about the same.
- For empty strings
- Hash calculation is 28% faster.
- Cached hash retrieval is 34% faster.
The main benefactors or this improvement are programs that create hash
tables containing a large amount of non-literal string data, such as
compilers and data-mining programs, which could see a 10% improvement in
their table-building phases.
These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using
J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but
were a bit slower all-around. Also, results were about the same under Linux.
Performance test results:
Hash computation (milliseconds for 10M trials):
Trial #1:
empty old: 640 (0) New is 27.7% faster
empty new: 501 (0)
1-char old: 962 (97) New is 5.6% faster
1-char new: 911 (97)
10-char old: 3595 (-634317659) New is 13.2% faster
10-char new: 3175 (-634317659)
100-char old: 28040 (-1399078606) New is 15.6% faster
100-char new: 24255 (-1399078606)
Trial #2:
0 old: 641 (0) New is 28.2% faster
0 new: 500 (0)
1 old: 962 (97) New is 6.8% faster
1 new: 901 (97)
10 old: 3595 (-634317659) New is 13.6% faster
10 new: 3165 (-634317659)
100 old: 28050 (-1399078606) New is 18.8% faster
100 new: 23604 (-1399078606)
1000 old: 272481 (1365024500) New is 19.9% faster
1000 new: 227297 (1365024500)
Hash retrieval (milliseconds for 100M trials):
(Shows that empty strings retrieve hash faster, others about the same.)
empty old: 6590 (0) New is 34.2% faster
empty new: 4907 (0)
1-char old: 5838 (97) < 1% difference
1-char new: 5819 (97)
10-char old: 5808 (-634317659) < 1% difference
10-char new: 5818 (-634317659)
100-char old: 5819 (-1399078606) < 1% difference
100-char new: 5818 (-1399078606)
1000-char old: 5818 (1365024500) < 1% difference
1000-char new: 5809 (1365024500)
*/
public class StringHashAlgorithm {
//
// Test parameters: Set to true to test hash computation, false to
// test hash retrieval (cached). The "production" setting is false.
//
private static final boolean ALWAYS_COMPUTE = true; // Normal is false
private static final int reps = ALWAYS_COMPUTE ? 10000000 : 100000000;
private interface QuasiString {
int hashCode();
}
private static final class NewQuasiString implements QuasiString {
private int hash = 0;
private int count;
private int offset = 0;
private char[] value;
public NewQuasiString(String s) {
value = s.toCharArray();
count = value.length;
}
//
// Proposed new hashCode technique. Parameter for the "real", non-test
// version is ALWAYS_COMPUTE = false.
//
public int hashCode() {
int h = hash;
if (h != 0)
return h;
int len = count;
if (len == 0)
return 0;
int off = offset;
char val[] = value;
len += off;
while (off < len)
h = 31 * h + val[off++];
if (ALWAYS_COMPUTE) {
return h = h;
} else {
return hash = h;
}
}
}
private static final class OldQuasiString implements QuasiString {
private int hash = 0;
private int count;
private int offset = 0;
private char[] value;
public OldQuasiString(String s) {
value = s.toCharArray();
count = value.length;
}
//
// Current hashCode technique.
//
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
if (ALWAYS_COMPUTE)
h = h;
else
hash = h;
}
return h;
}
}
public static void main(String[] args) {
String chars10 = "abcdefghij";
String chars100 = chars10 + chars10 + chars10 + chars10 +
chars10 + chars10 + chars10 + chars10 + chars10 + chars10;
String chars1000 = chars100 + chars100 + chars100 + chars100 +
chars100 + chars100 + chars100 + chars100 + chars100 + chars100;
test("0 old", new OldQuasiString(""));
test("0 new", new NewQuasiString(""));
test("1 old", new OldQuasiString("a"));
test("1 new", new NewQuasiString("a"));
test("10 old", new OldQuasiString(chars10));
test("10 new", new NewQuasiString(chars10));
test("100 old", new OldQuasiString(chars100));
test("100 new", new NewQuasiString(chars100));
test("1000 old", new OldQuasiString(chars1000));
test("1000 new", new NewQuasiString(chars1000));
}
private static void test(String msg, QuasiString s) {
long t0 = System.currentTimeMillis();
for (int i = reps; --i >= 0; ) {
s.hashCode();
}
long t = System.currentTimeMillis() - t0;
System.out.println(msg + ": " + t + " (" + s.hashCode() + ")");
}
}
---------- END SOURCE ----------
(Incident Review ID: 245459)
======================================================================
- duplicates
-
JDK-6932858 (str) Tune hashCode() of String
- Open