Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4513622

(str) keeping a substring of a field prevents GC for object

XMLWordPrintable

    • generic, x86
    • generic, windows_nt, windows_xp

      Name: bsT130419 Date: 10/11/2001


      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta2-b77)
      Java HotSpot(TM) Client VM (build 1.4.0-beta2-b77, mixed mode)

      The following code produces an OutOfMemory error because
      objects don't get garbage collected if the caller stores a substring
      of a field in the object. Appears to occur in any JVM for WinNT.

      public class TestGC {
          private String largeString = new String(new byte[100000]);
          private String smallString = "foo";
          
          String getString() {
              // if caller stores this substring, this object will not be gc'ed
              return this.largeString.substring(0,2);
      // return new String(this.largeString.substring(0,2)); // no error here!
      // return smallString; // no error here!
          }
          
          public static void main(String[] args) {
              java.util.ArrayList list = new java.util.ArrayList();
              for (int i = 0; i < 1000000; i++) {
                  TestGC gc = new TestGC();
                  list.add(gc.getString());
              }
          }
      }
      (Review ID: 133550)
      ======================================================================
      Suggested fix provided by java.net community member leouser:

      A DESCRIPTION OF THE FIX :
      BUG 4513622
      JDK VERSION: jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
       
      Attached is a conceivable solution to the problem that substring is not quite adequate for some people's needs and in fact can provide memory leak problems for it. My solution is not the most radical of the possible ones but I think it is more usuable than just having javadoc comments talking about substrings implementation.

      Additional Commentary( embeded in test as well ):
      /**
       * BUG ID 4513622 --> keeping a substring of a field prevents GC for object.
       *
       * Rationale:
       * String.substring returns a new String but shares the underlying
       * char array. This causes GC problems for clients that do not have
       * an understanding of this. They will learn of this problem by:
       * 1. Reading the source
       * 2. Reading about it someplace in an article or posting
       * 3. Profiling a memory leak in their application --> this may be hard to find
       * In many instances they may never be aware that there is a problem.
       *
       * I became aware of it via #2. But it appears that many others have
       * experienced it because of #3.
       *
       * This leads us to a conundrum:
       * 1. There are users who would prefer that substring returns a fresh string.
       * 2. There are users who would prefer that substring remains as it is.
       *
       * How can both needs be met?
       * SOLUTION 1: provide java doc that clarifies for the user how substring
       * creates a substring. Also provide in the javadoc a quick idiom for getting
       * a fresh string:
       * String s = x.substring( 1 );
       * s = new String( s );
       *
       * SOLUTION 2: provide a different implementation that meets the efficiency
       * requirements of substring but also allows for GC of the parent string/char array
       *
       * SOLUTION 3: provide javadoc clarifying how substring works and providing
       * mirror methods that return a new substring.
       *
       * Of these I have chosen #3 to begin work. My decision is currently based on:
       * a. Solution 2 may be too disruptive to existing String clients.
       * b. Solution 1 may not go far enough in providing utility to clients. Developers
       * will want a quick and easy way to create substrings. A 2 line idiom is too
       * burdensome for such a common need.
       * Of these 2, if Solution 3 is rejected, I would prefer #1.
       *
       * To implement solution 3 I have added to mirror methods:
       * newString( int beginIndex, int endIndex )
       * newString( int beginIndex )
       * and have clarified substring in the javadoc, pointing also to newString.
       *
       * I have choosen newString as the name because it is so distinct from substring.
       * Ideally substring shouldn't be called substring but something like:
       * subview
       * which would be more indicative of what it does.
       * newString has implementations that are essentially the same as substring( and
       * also the same javadoc ) but differ in how they produce their products.
       *
       * IMPACT:
       * At this juncture do to the final nature of String there should be no impact
       * upon existing clients, you cannot subclass String. Anyone who does is using
       * some extralinguistic mechanism to acheive the subclass.
       *
       * NEGATIVES:
       * 1. 2 new methods add burden to the comprehension of the API
       * 2. javadoc now mentions the implementation of substring forever cementing
       * its implementation.
       *
       * I think though the negative consequences of being ignorant of what substring
       * does is more damaging than either NEGATIVES 1 or 2. The developer will spend much longer fixing these problems in his code than the 2 - 5 minutes
       * it will take to learn these methods forever. I recently read an article where
       * an application removed some severe memory leaks because of the substring
       * implementation. If newString or even clear javadoc about its nature were
       * present the work involved in resolving these problems may never have had
       * to occur.
       *
       * Testing note:
       * This test tests against expected values and also if the equals method
       * returns true for a string derived from substring. Why? newString is a
       * mirror of substring, except it returns a new String. They should always
       * return true.
       *
       * String enhanced with java 6 version:
       * jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
       *
       * test ran succesfully on a SUSE 7.3 Linux distribution
       *
       * Brian Harry
       * ###@###.###
       * jan 2, 2006
       */

      UNIFIED DIFF:
      --- /home/nstuff/java6/jdk1.6.0/java/lang/String.java Thu Dec 15 02:16:40 2005
      +++ /home/javarefs/strings/substring/java/lang/String.java Mon Jan 2 15:30:22 2006
      @@ -1733,6 +1733,12 @@
            * "emptiness".substring(9) returns "" (an empty string)
            * </pre></blockquote>
            *
      + * Important note: the substring returned by this method is
      + * a view of the internals of the instance. This will prevent
      + * garbage collection of the instance if the returned string is
      + * held onto. To get a substring that is not bound to the internals
      + * of the instance use newString( int beginIndex ) instead.
      + *
            * @param beginIndex the beginning index, inclusive.
            * @return the specified substring.
            * @exception IndexOutOfBoundsException if
      @@ -1754,6 +1760,13 @@
            * "hamburger".substring(4, 8) returns "urge"
            * "smiles".substring(1, 5) returns "mile"
            * </pre></blockquote>
      + *
      + * Important note: the substring returned by this method is
      + * a view of the internals of the instance. This will prevent
      + * garbage collection of the instance if the returned string is
      + * held onto. To get a substring that is not bound to the internals
      + * of the instance use newString( int beginIndex, int endIndex )
      + * instead.
            *
            * @param beginIndex the beginning index, inclusive.
            * @param endIndex the ending index, exclusive.
      @@ -1809,6 +1822,76 @@
            */
           public CharSequence subSequence(int beginIndex, int endIndex) {
               return this.substring(beginIndex, endIndex);
      + }
      +
      + /**
      + * Returns a new string that is a substring of this string. The
      + * substring begins at the specified <code>beginIndex</code> and
      + * extends to the character at index <code>endIndex - 1</code>.
      + * Thus the length of the substring is <code>endIndex-beginIndex</code>.
      + * <p>
      + * Examples:
      + * <blockquote><pre>
      + * "hamburger".newString(4, 8) returns "urge"
      + * "smiles".newString(1, 5) returns "mile"
      + * </pre></blockquote>
      + *
      + * IMPORTANT NOTE: newString needs to be distinguished from
      + * substring in that it returns an instance that is not connected
      + * to the instance from which is was derived. It will not block
      + * garbage collection of the base instance.
      + *
      + * @param beginIndex the beginning index, inclusive.
      + * @param endIndex the ending index, exclusive.
      + * @return the specified substring.
      + * @exception IndexOutOfBoundsException if the
      + * <code>beginIndex</code> is negative, or
      + * <code>endIndex</code> is larger than the length of
      + * this <code>String</code> object, or
      + * <code>beginIndex</code> is larger than
      + * <code>endIndex</code>.
      + */
      + public String newString( int beginIndex, int endIndex ){
      +
      + if (beginIndex < 0) {
      + throw new StringIndexOutOfBoundsException(beginIndex);
      + }
      + if (endIndex > count) {
      + throw new StringIndexOutOfBoundsException(endIndex);
      + }
      + if (beginIndex > endIndex) {
      + throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
      + }
      + return new String( value, beginIndex, endIndex - beginIndex );
      +
      + }
      +
      + /**
      + * Returns a new string that is a substring of this string. The
      + * substring begins with the character at the specified index and
      + * extends to the end of this string. <p>
      + * Examples:
      + * <blockquote><pre>
      + * "unhappy".newString(2) returns "happy"
      + * "Harbison".newString(3) returns "bison"
      + * "emptiness".newString(9) returns "" (an empty string)
      + * </pre></blockquote>
      + *
      + * IMPORTANT NOTE: newString needs to be distinguished from
      + * substring in that it returns an instance that is not connected
      + * to the instance from which is was derived. It will not block
      + * garbage collection of the base instance.
      + *
      + * @param beginIndex the beginning index, inclusive.
      + * @return the specified substring.
      + * @exception IndexOutOfBoundsException if
      + * <code>beginIndex</code> is negative or larger than the
      + * length of this <code>String</code> object.
      + */
      + public String newString( int beginIndex ){
      +
      + return newString( beginIndex, count );
      +
           }
       
           /**


      JUnit TESTCASE :
      import junit.framework.TestCase;
      import junit.textui.TestRunner;
      import static java.lang.System.out;
      import java.util.*;


      /**
       * BUG ID 4513622 --> keeping a substring of a field prevents GC for object.
       *
       * Rationale:
       * String.substring returns a new String but shares the underlying
       * char array. This causes GC problems for clients that do not have
       * an understanding of this. They will learn of this problem by:
       * 1. Reading the source
       * 2. Reading about it someplace in an article or posting
       * 3. Profiling a memory leak in their application --> this may be hard to find
       * In many instances they may never be aware that there is a problem.
       *
       * I became aware of it via #2. But it appears that many others have
       * experienced it because of #3.
       *
       * This leads us to a conundrum:
       * 1. There are users who would prefer that substring returns a fresh string.
       * 2. There are users who would prefer that substring remains as it is.
       *
       * How can both needs be met?
       * SOLUTION 1: provide java doc that clarifies for the user how substring
       * creates a substring. Also provide in the javadoc a quick idiom for getting
       * a fresh string:
       * String s = x.substring( 1 );
       * s = new String( s );
       *
       * SOLUTION 2: provide a different implementation that meets the efficiency
       * requirements of substring but also allows for GC of the parent string/char array
       *
       * SOLUTION 3: provide javadoc clarifying how substring works and providing
       * mirror methods that return a new substring.
       *
       * Of these I have chosen #3 to begin work. My decision is currently based on:
       * a. Solution 2 may be too disruptive to existing String clients.
       * b. Solution 1 may not go far enough in providing utility to clients. Developers
       * will want a quick and easy way to create substrings. A 2 line idiom is too
       * burdensome for such a common need.
       * Of these 2, if Solution 3 is rejected, I would prefer #1.
       *
       * To implement solution 3 I have added to mirror methods:
       * newString( int beginIndex, int endIndex )
       * newString( int beginIndex )
       * and have clarified substring in the javadoc, pointing also to newString.
       *
       * I have choosen newString as the name because it is so distinct from substring.
       * Ideally substring shouldn't be called substring but something like:
       * subview
       * which would be more indicative of what it does.
       * newString has implementations that are essentially the same as substring( and
       * also the same javadoc ) but differ in how they produce their products.
       *
       * IMPACT:
       * At this juncture do to the final nature of String there should be no impact
       * upon existing clients, you cannot subclass String. Anyone who does is using
       * some extralinguistic mechanism to acheive the subclass.
       *
       * NEGATIVES:
       * 1. 2 new methods add burden to the comprehension of the API
       * 2. javadoc now mentions the implementation of substring forever cementing
       * its implementation.
       *
       * I think though the negative consequences of being ignorant of what substring
       * does is more damaging than either NEGATIVES 1 or 2. The developer will spend much longer fixing these problems in his code than the 2 - 5 minutes
       * it will take to learn these methods forever. I recently read an article where
       * an application removed some severe memory leaks because of the substring
       * implementation. If newString or even clear javadoc about its nature were
       * present the work involved in resolving these problems may never have had
       * to occur.
       *
       * Testing note:
       * This test tests against expected values and also if the equals method
       * returns true for a string derived from substring. Why? newString is a
       * mirror of substring, except it returns a new String. They should always
       * return true.
       *
       * String enhanced with java 6 version:
       * jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
       *
       * test ran succesfully on a SUSE 7.3 Linux distribution
       *
       * Brian Harry
       * ###@###.###
       * jan 2, 2006
       */
      public class StringNewString extends TestCase{


          public StringNewString( String test ){

      super( test );
          
          }

          public void testNewString(){

      out.println( "" );
      String s = "Java is cool";
      String test1 = "is cool";
      String test2 = "Java is";
      String test3 = "va is cool";
      String test4 = "is";

      out.println( "TEST 1" );
      String nstring1 = s.newString( 5 );
      out.println( nstring1 );
      assertEquals( nstring1, test1 );
      assertEquals( s.substring( 5 ), nstring1 );
      out.println( "TEST 2" );
      String nstring2 = s.newString( 0, 7 );
      out.println( nstring2 );
      assertEquals( nstring2, test2 );
      assertEquals( s.substring( 0, 7 ), nstring2 );
      out.println( "TEST 3" );
      String nstring3 = s.newString( 2 );
      out.println( nstring3 );
      assertEquals( nstring3, test3 );
      assertEquals( s.substring( 2 ), nstring3 );
      out.println( "TEST 4" );
      String nstring4 = s.newString( 5, 7 );
      out.println( nstring4 );
      assertEquals( nstring4, test4 );
      assertEquals( s.substring( 5,7), nstring4 );

      //these should fail
      try{

      out.println( "TEST 5 should fail" );
      String nstring5 = s.newString( -1, -1 );
      assertEquals( nstring5, s );

      }
      catch( Exception x ){

      out.println( "Failure 1 as expected" );

      }

      try{
      out.println( "TEST 6 should fail, report 1 error" );
      String nstring6 = s.newString( 7, 8 );
      assertEquals( nstring6, test4 );

      }
      catch( Exception x ){

      out.println( "Failure 2 as expected" );

      }

          }

          public static void main( String[] args ){

      out.println( "Testing with testNewString" );
      TestRunner.run( new StringNewString( "testNewString" ) );

          }


      }


      FIX FOR BUG NUMBER:
      4513622
      Given the history of this bug, the solution with the least impact on existing code, test, performance, etc. is to simply change the Javadoc to indicate that substring does not return a new String and to show the workaround as suggested.
      I retract my suggstions. It turns out that as of 6924259 this issue is closable as fixed for both 7u6 and 8. The char buffers are no longer shared unless the strings are identical.

            mduigou Mike Duigou
            bstrathesunw Bill Strathearn (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: