Name: yyT116575 Date: 04/12/2001
java version "1.2.1"
Solaris VM (build Solaris_JDK_1.2.1_04, native threads, sunwjit)
The intern method of java.lang.String does not scale at all well.
The code below contains generates 1 000 000 Strings and interns them. Commented
out are versions which either don't intern the strings or perform an outline of
a naive interning method by storing them in a synchronized HashSet.
Surprisingly the naive implementation visibly outperformes the intern method.
class Test {
public static void main(String[] args) {
int version = Integer.parseInt(args[0]);
long startTime = System.currentTimeMillis();
java.util.Set internedStrings =
java.util.Collections.synchronizedSet(new java.util.HashSet(10000000));
String[] strings = new String[500000];
for (int offset=0; offset<500000; offset+=100000) {
for (int i=0; i<100000; i++) {
switch (version) {
case 1:
// Version 1: No interning
strings[offset+i] = String.valueOf(offset+i);
break;
case 2:
// Version 2: Interning
strings[offset+i] = String.valueOf(offset+i).intern();
break;
case 3:
// Version 3: Mocked up "interning"
internedStrings.add(String.valueOf(offset+i));
break;
}
}
System.out.println("At offset "+offset);
}
System.out.println("Took "+(System.currentTimeMillis()-startTime) + "ms");
}
}
A look in the source code for hotspot indicates what the problem is the data is
stored in a hashtable with a fixed number (20011 oddly enough) of buckets
(symbol_table_size = 20011 in symbolTable.hpp). This ultimately means that the
intern method scales only linearly as the number of interned strings increases.
I have tested it using 1.3.0_02 on windows and the same problem occurs. Here
is the output from java -version
java version "1.3.0_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0_02)
Java HotSpot(TM) Client VM (build 1.3.0_02, mixed mode)
Run the program by: java -mx200M Test 2 for the problem case.
java -mx200M Test 1 and java -mx200M Test 3 show similar
programs to highlight the performance difference.
Version: Time taken:
1 3315 ms
2 58534 ms
3 6650 ms
I also found that if I upped the number of strings created to 1 million,
version 2 fails after a while with the unhelpful message ' Exception in
thread "main" '.
(Review ID: 120579)
======================================================================