Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8204530

Operation of URLEncoder not congruent with RFC 3986 - URI Generic Syntax

XMLWordPrintable

    • generic
    • generic

      ADDITIONAL SYSTEM INFORMATION :
      All systems across the entire universe and beyond.

      A DESCRIPTION OF THE PROBLEM :
      Per RFC 3986, characters:
          unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
          reserved = gen-delims / sub-delims
          gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
          sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

      "For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers."

      Per URLEncoder Documentation and Operation : The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same. The special characters ".", "-", "*", and "_" remain the same.

      Per RFC 3986, the ~ is not considered a reserved character, but the * is.
      Thus, in URLEncoder, the ~ is encoded when it should not be, and the * is not encoded when it should be.
      Per the current RFC, URLEncoder has no basis to consider * a "special character" that should not be encoded.
      URLEncoder has more basis to consider ~ a "special character" that should not be encoded.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      System.out.println( URLEncoder.encode( "~", "UTF-8" ) ) ;
      System.out.println( URLEncoder.encode( "*", "UTF-8" ) ) ;

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      System.out.println( URLEncoder.encode( "~", "UTF-8" ) ) ; // expected result: ~
      System.out.println( URLEncoder.encode( "*", "UTF-8" ) ) ; // expected result: %2A
      ACTUAL -
      System.out.println( URLEncoder.encode( "~", "UTF-8" ) ) ; // actual result: %7E
      System.out.println( URLEncoder.encode( "*", "UTF-8" ) ) ; // actual result: *

      ---------- BEGIN SOURCE ----------
      System.out.println( URLEncoder.encode( "~", "UTF-8" ) ) ;
      System.out.println( URLEncoder.encode( "*", "UTF-8" ) ) ;
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Use a different URL encoder.

      FREQUENCY : always


            chegar Chris Hegarty
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: