Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8318004

URLEncoder should specify that replacement bytes will be used in case of coding error

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 22
    • core-libs
    • None
    • behavioral
    • minimal
    • This is just documenting longstanding behaviour. There is no change in the implementation.
    • Java API
    • SE

      Summary

      Updated the API documentation of URLEncoder.encode and URLDecoder.decode to reflect pre-existing behavior.

      Problem

      Currently the descriptions of URLEncoder.encode and URLDecoder.decode don't specify their use of replacement bytes or replacement character when they cannot handle a character or sequence of bytes. This is longstanding behavior but needs to be documented.

      Solution

      Added a new line to URLEncoder.encode API documentation to document that the charset's replacement bytes are used.

      Also changed URLDecoder.decode API documentation to document its use of the charset's replacement character, also changed some wording and used apiNote.

      updated the other decode methods in URLDecoder to reflect that they can throw IllegalArgumentException

      Specification

      java.net.URLEncoder.encode

           /**
            * Translates a string into {@code application/x-www-form-urlencoded}
            * format using a specific {@linkplain Charset Charset}.
            * This method uses the supplied charset to obtain the bytes for unsafe
            * characters.
            * <p>
        -   * <em><strong>Note:</strong> The <a href=
        +   * If the input string is malformed, or if the input cannot be mapped
        +   * to a valid byte sequence in the given {@code Charset}, then the
        +   * erroneous input will be replaced with the {@code Charset}'s
        +   * {@linkplain CharsetEncoder##cae replacement values}.
        +   *
        +   * @apiNote The <a href=
            * "http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars">
            * World Wide Web Consortium Recommendation</a> states that
        -   * UTF-8 should be used. Not doing so may introduce incompatibilities.</em>
        -   *
        +   * UTF-8 should be used. Not doing so may introduce incompatibilities.
            * @param   s   {@code String} to be translated.
            * @param charset the given charset
            * @return  the translated {@code String}.
            * @param charset the given charset

      java.net.URLDecoder.Decode

      @@ -98,6 +98,8 @@ private URLDecoder() {}
           *          default charset. Instead, use the decode(String,String) method
           *          to specify the encoding.
           * @return the newly decoded {@code String}
       +   * @throws IllegalArgumentException if the implementation encounters malformed
       +   * escape sequences
      
      
      @@ -113,9 +115,6 @@ public static String decode(String s) {
           * except that it will {@linkplain Charset#forName look up the charset}
           * using the given encoding name.
           *
       -   * @implNote This implementation will throw an {@link java.lang.IllegalArgumentException}
       -   * when illegal strings are encountered.
       -   *
      
      
      @@ -124,6 +123,8 @@ public static String decode(String s) {
           * @throws UnsupportedEncodingException
           *             If character encoding needs to be consulted, but
           *             named character encoding is not supported
       +   * @throws IllegalArgumentException if the implementation encounters malformed
       +   * escape sequences
      
      
      @@ -144,24 +145,23 @@ public static String decode(String s, String enc) throws UnsupportedEncodingExce
           * Decodes an {@code application/x-www-form-urlencoded} string using
           * a specific {@linkplain Charset Charset}.
           * The supplied charset is used to determine
       -   * what characters are represented by any consecutive sequences of the
       -   * form "<i>{@code %xy}</i>".
       +   * what characters are represented by any consecutive escape sequences of
       +   * the form "<i>{@code %xy}</i>". Erroneous bytes are replaced with the
       +   * supplied {@code Charset}'s {@linkplain java.nio.charset.CharsetDecoder##cae
       +   * replacement value}.
           * <p>
           * <em><strong>Note:</strong> The <a href=
           * "http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars">
           * World Wide Web Consortium Recommendation</a> states that
           * UTF-8 should be used. Not doing so may introduce
           * incompatibilities.</em>
           *
       -   * @implNote This implementation will throw an {@link java.lang.IllegalArgumentException}
       -   * when illegal strings are encountered.
       -   *
           * @param s the {@code String} to decode
           * @param charset the given charset
           * @return the newly decoded {@code String}
           * @throws NullPointerException if {@code s} or {@code charset} is {@code null}
       -   * @throws IllegalArgumentException if the implementation encounters illegal
       -   * characters
       +   * @throws IllegalArgumentException if the implementation encounters malformed
       +   * escape sequences

            dclarke Darragh Clarke
            dfuchs Daniel Fuchs
            Alan Bateman, Daniel Fuchs
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: