Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4443613

intern method in a string scales badly

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: P5 P5
    • 9
    • 1.3.0
    • hotspot
    • sparc
    • solaris_2.6



      Name: yyT116575 Date: 04/12/2001


      java version "1.2.1"
      Solaris VM (build Solaris_JDK_1.2.1_04, native threads, sunwjit)

      The intern method of java.lang.String does not scale at all well.

      The code below contains generates 1 000 000 Strings and interns them. Commented
      out are versions which either don't intern the strings or perform an outline of
      a naive interning method by storing them in a synchronized HashSet.
      Surprisingly the naive implementation visibly outperformes the intern method.

      class Test {
        public static void main(String[] args) {

          int version = Integer.parseInt(args[0]);

          long startTime = System.currentTimeMillis();
          
          java.util.Set internedStrings =
      java.util.Collections.synchronizedSet(new java.util.HashSet(10000000));
          String[] strings = new String[500000];
              
          for (int offset=0; offset<500000; offset+=100000) {
            for (int i=0; i<100000; i++) {

              switch (version) {
                case 1:
                  // Version 1: No interning
                  strings[offset+i] = String.valueOf(offset+i);
                  break;
                case 2:
                  // Version 2: Interning
                  strings[offset+i] = String.valueOf(offset+i).intern();
                  break;
                case 3:
                  // Version 3: Mocked up "interning"
                  internedStrings.add(String.valueOf(offset+i));
                  break;
              }
            }
            System.out.println("At offset "+offset);
          }

          System.out.println("Took "+(System.currentTimeMillis()-startTime) + "ms");
        }
      }

      A look in the source code for hotspot indicates what the problem is the data is
      stored in a hashtable with a fixed number (20011 oddly enough) of buckets
      (symbol_table_size = 20011 in symbolTable.hpp). This ultimately means that the
      intern method scales only linearly as the number of interned strings increases.

      I have tested it using 1.3.0_02 on windows and the same problem occurs. Here
      is the output from java -version

      java version "1.3.0_02"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0_02)
      Java HotSpot(TM) Client VM (build 1.3.0_02, mixed mode)

      Run the program by: java -mx200M Test 2 for the problem case.
      java -mx200M Test 1 and java -mx200M Test 3 show similar
      programs to highlight the performance difference.

      Version: Time taken:
      1 3315 ms
      2 58534 ms
      3 6650 ms

      I also found that if I upped the number of strings created to 1 million,
      version 2 fails after a while with the unhelpful message ' Exception in
      thread "main" '.

      (Review ID: 120579)
      ======================================================================

            gziemski Gerard Ziemski
            yyoungsunw Yung-ching Young (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: