Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4822370

java.net.URLDecoder has poor performance

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P4 P4
    • 5.0
    • 1.4.1
    • core-libs
    • tiger
    • x86
    • windows_2000



      Name: nt126004 Date: 02/21/2003


      FULL PRODUCT VERSION :
      java version "1.4.1_01"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
      Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)

      FULL OPERATING SYSTEM VERSION : Microsoft Windows 2000
      [Version 5.00.2195]


      ADDITIONAL OPERATING SYSTEMS : All



      EXTRA RELEVANT SYSTEM CONFIGURATION :
      Problem applies for all versions of the JDK.

      A DESCRIPTION OF THE PROBLEM :
      The class, java.net.URLDecoder, performs extremly poorly -
      to the point of unusability. For example, it takes roughly
      14 seconds to decode a string of approximately 500K
      characters on a Pentium 4 workstation.

      The performance problem stems from URLDecoder's repeated
      allocation of an internal temporary buffer used for
      character encoding. By changing URLDecoder's implementation
      to allocate the buffer once, the encoding time dropped from
      14 seconds to approximately 200 milliseconds.

      As an aside, URLDecoder allocates the StringBuffer uses as
      output with a default buffer size. The StringBuffer should
      be allocated with a buffer that is at least as large as
      1/3rd the size of the input string, based on simple math.



      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Execute java.net.URLDecoder.decode()



      EXPECTED VERSUS ACTUAL BEHAVIOR :
      Expected results: Approximately linear performance results
      with respect to the length of the input string and the
      number of encoded characters.

      Actual Results: Geometrically explosive decoding times,
      depending upon the length of the input string and the number
      of encoded characters.


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      The following source code is a modified version of Sun's implementation of
      java.net.URLDecoder. It implements the two specified changes with the resulting
      performance improvement. Note: This is not a version of URLDecoder optimized to
      the Nth degree - It only contains 2 very simple changes which result in a 100x
      performance improvement over the existing URLDecoder.


      /*
       * A slightly modified version of Sun's implementation of java.net.URLDecoder.
       * The original copyright follows.
       *
       * @author Toby Reyelts
       *
       */

      /*
       * @(#)URLDecoder.java 1.20 01/12/03
       *
       * Copyright 2002 Sun Microsystems, Inc. All rights reserved.
       * SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
       */

      package test;

      import java.io.*;

      /**
       * Utility class for HTML form decoding. This class contains static methods
       * for decoding a String from the <CODE>application/x-www-form-urlencoded</CODE>
       * MIME format.
       * <p>
       * To conversion process is the reverse of that used by the URLEncoder class. It
      is assumed
       * that all characters in the encoded string are one of the following:
       * &quot;<code>a</code>&quot; through &quot;<code>z</code>&quot;,
       * &quot;<code>A</code>&quot; through &quot;<code>Z</code>&quot;,
       * &quot;<code>0</code>&quot; through &quot;<code>9</code>&quot;, and
       * &quot;<code>-</code>&quot;, &quot;<code>_</code>&quot;,
       * &quot;<code>.</code>&quot;, and &quot;<code>*</code>&quot;. The
       * character &quot;<code>%</code>&quot; is allowed but is interpreted
       * as the start of a special escaped sequence.
       * <p>
       * The following rules are applied in the conversion:
       * <p>
       * <ul>
       * <li>The alphanumeric characters &quot;<code>a</code>&quot; through
       * &quot;<code>z</code>&quot;, &quot;<code>A</code>&quot; through
       * &quot;<code>Z</code>&quot; and &quot;<code>0</code>&quot;
       * through &quot;<code>9</code>&quot; remain the same.
       * <li>The special characters &quot;<code>.</code>&quot;,
       * &quot;<code>-</code>&quot;, &quot;<code>*</code>&quot;, and
       * &quot;<code>_</code>&quot; remain the same.
       * <li>The plus sign &quot;<code>+</code>&quot; is converted into a
       * space character &quot;<code>&nbsp;</code>&quot; .
       * <li>A sequence of the form "<code>%<i>xy</i></code>" will be
       * treated as representing a byte where <i>xy</i> is the two-digit
       * hexadecimal representation of the 8 bits. Then, all substrings
       * that contain one or more of these byte sequences consecutively
       * will be replaced by the character(s) whose encoding would result
       * in those consecutive bytes.
       * The encoding scheme used to decode these characters may be specified,
       * or if unspecified, the default encoding of the platform will be used.
       * </ul>
       * <p>
       * There are two possible ways in which this decoder could deal with
       * illegal strings. It could either leave illegal characters alone or
       * it could throw an <tt>{@link java.lang.IllegalArgumentException}</tt>.
       * Which approach the decoder takes is left to the
       * implementation.
       *
       * @author Mark Chamness
       * @author Michael McCloskey
       * @version 1.20, 12/03/01
       * @since 1.2
       */

      public class FastUrlDecoder {

        // The platform default encoding
        // Can't use this, since it's private to java.net. Of course,
        // this can be changed back for the real java.net.URLDecoder.
        // static String dfltEncName = java.net.URLEncoder.dfltEncName;
        static String dfltEncName = null;

        /**
         * Decodes a <code>x-www-form-urlencoded</code> string.
         * The platform's default encoding is used to determine what characters
         * are represented by any consecutive sequences of the form
         * "<code>%<i>xy</i></code>".
         * @param s the <code>String</code> to decode
         * @deprecated The resulting string may vary depending on the platform's
         * default encoding. Instead, use the decode(String,String) method
         * to specify the encoding.
         * @return the newly decoded <code>String</code>
         */
        public static String decode(String s) {

          String str = null;

          try {
            str = decode(s, dfltEncName);
          }
          catch ( UnsupportedEncodingException e ) {
            // The system should always have the platform default
          }

          return str;
        }

        /**
         * Decodes a <code>application/x-www-form-urlencoded</code> string using a
      specific
         * encoding scheme.
         * The supplied encoding is used to determine
         * what characters are represented by any consecutive sequences of the
         * form "<code>%<i>xy</i></code>".
         * <p>
         * <em><strong>Note:</strong> The <a href=
         * "http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars">
         * World Wide Web Consortium Recommendation</a> states that
         * UTF-8 should be used. Not doing so may introduce
         * incompatibilites.</em>
         *
         * @param s the <code>String</code> to decode
         * @param enc The name of a supported
         * <a href="../lang/package-summary.html#charenc">character
         * encoding</a>.
         * @return the newly decoded <code>String</code>
         * @exception UnsupportedEncodingException
         * If the named encoding is not supported
         * @see URLEncoder#encode(java.lang.String, java.lang.String)
         */
        public static String decode( String s, String enc ) throws
      UnsupportedEncodingException {

          boolean needToChange = false;

          /* Modification 1 - The original URLDecoder allocates a StringBuffer with a
      default
           * buffer size. For the composition of large strings, constant reallocation of
           * the internal buffer in StringBuffer is expensive.
           *
           * In the case of url decoding, the smallest decoded string would require
           * at least 1/3rd the space of the original string. We can allocate
           * a buffer the length of <code>s</code> optimistically, or to be conservative,
           * we could allocate the buffer as something like numChars / 3 or numChars / 2.
           */
          int numChars = s.length();
          // StringBuffer sb = new StringBuffer();
          StringBuffer sb = new StringBuffer( numChars );

          int i = 0;

      /** Commented out temporarily, while this is not java.net.URLDecoder
          if ( enc.length() == 0 ) {
            throw new UnsupportedEncodingException ("URLDecoder: empty string enc
      parameter");
          }
      */

          /* Modification 2 - In the original URLDecoder, <code>bytes</code> was local
           * to the try block for <code>case '%'</code>. Every single time a '%' was
      encountered,
           * URLDecoder allocated an entirely new buffer 1/3rd the size of the rest
           * of the string to be decoded. That is an entirely wasteful allocation of
           * heap space. For example, given a string of length 500K with approximately
           * 1K encoded characters, the URLDecoder would allocate something on the
           * order of 100M of memory.
           *
           * The simple fix, used here, is to only allocate the buffer once during the
           * execution of the decode. The first time it is lazily allocated, it will
      be large
           * enough for all ensuing encounters with any encoded characters.
           *
           */
          byte[] bytes = null;

          while ( i < numChars ) {

            char c = s.charAt( i );

            switch ( c ) {

              case '+':
                sb.append(' ');
                i++;
                needToChange = true;
              break;

              case '%':

              /*
               * Starting with this instance of %, process all
               * consecutive substrings of the form %xy. Each
               * substring %xy will yield a byte. Convert all
               * consecutive bytes obtained this way to whatever
               * character(s) they represent in the provided
               * encoding.
               */

                try {

                  // (numChars-i)/3 is an upper bound for the number
                  // of remaining bytes
       
                  /* Lazily allocated - and only once in the new version of URLDecoder
                   */
                  if ( bytes == null ) {
                    bytes = new byte[ ( numChars - i ) / 3 ];
                  }

                  int pos = 0;

                  while ( ((i+2) < numChars) && (c=='%')) {

                    bytes[pos++] = (byte)Integer.parseInt(s.substring(i+1,i+3),16);
                    i+= 3;

                    if ( i < numChars ) {
      c = s.charAt(i);
                    }
                  }

                  // A trailing, incomplete byte encoding such as
                  // "%x" will cause an exception to be thrown
                  if ((i < numChars) && (c=='%')) {
                    throw new IllegalArgumentException( "URLDecoder: Incomplete
      trailing escape (%) pattern");
                  }

                  if ( enc == null ) {
                    sb.append(new String(bytes, 0, pos ));
                  }
                  else {
                    sb.append(new String(bytes, 0, pos, enc));
                  }
                }
                catch (NumberFormatException e) {
                  throw new IllegalArgumentException( "URLDecoder: Illegal hex
      characters in escape (%) pattern - " + e.getMessage());
                }

      needToChange = true;
              break;

              default:
                sb.append(c);
                i++;
              break;
            }
          }

          return (needToChange? sb.toString() : s);
        }
      }



      ---------- END SOURCE ----------

      CUSTOMER WORKAROUND :
      Rewrite a new url decoder from scratch.
      (Review ID: 181588)
      ======================================================================

            jccollet Jean-Christophe Collet (Inactive)
            nthompsosunw Nathanael Thompson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: