Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8369740

Locale.Builder should fail on duplicate extensions and U-extension keywords/attributes

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Withdrawn
    • Icon: P4 P4
    • None
    • core-libs
    • None
    • behavioral
    • low
    • Hide
      `Locale.Builder.setLanguageTag(String)` now throws for duplicate singleton extensions as well as duplicate U-extension keys and attributes.

      `Locale.Builder.setExtension(char, String)` now throws for duplicate U-extension keys and attributes.

      Applications that provide such duplicates will now fail. However, the risk is deemed low because such duplicates are considered ill-formed and ignored by the `Locale.Builder`, so it is unlikely applications would depend on such behavior. The chances that a duplicate is supplied to Locale.Builder over the other Locale constructing APIs is also low, because the builder class conveys itself as a strict API which throws on ill-formed input.
      Show
      `Locale.Builder.setLanguageTag(String)` now throws for duplicate singleton extensions as well as duplicate U-extension keys and attributes. `Locale.Builder.setExtension(char, String)` now throws for duplicate U-extension keys and attributes. Applications that provide such duplicates will now fail. However, the risk is deemed low because such duplicates are considered ill-formed and ignored by the `Locale.Builder`, so it is unlikely applications would depend on such behavior. The chances that a duplicate is supplied to Locale.Builder over the other Locale constructing APIs is also low, because the builder class conveys itself as a strict API which throws on ill-formed input.
    • Java API
    • SE

      Summary

      Locale.Builder should throw IllformedLocaleException for duplicate extension singletons, U extension attributes, and U extension keys in BCP47 language tags.

      Problem

      Locale.Builder is a strict API when constructing Locale instances. Locale.Builder.setLanguageTag(String) is specified as,

      the language tag must be well-formed (see Locale) or an exception is thrown (unlike Locale.forLanguageTag, which just discards ill-formed and following portions of the tag).

      However, this API currently accepts and silently discards duplicate extension singletons, U extension attributes, and U extension keys.

      Regarding BCP 47 language tags, RFC 5646 states,

      Each singleton subtag MUST appear at most one time in each tag (other than as a private use subtag). That is, singleton subtags MUST NOT be repeated.

      Regarding the BCP 47 Extension U, RFC 6067 states,

      Only the first occurrence of an attribute or key conveys meaning in a language tag.

      Based off of the specification, it would be permissible to throw an exception for these cases since such tags are considered ill-formed.

      Solution

      These duplicate occurrences should be defined in the specification as ill-formed. The APIs already describe their respective behavior when a tag is ill-formed or not.

      As such, Locale.Builder.setLanguageTag(String) should throw IllformedLocaleException for duplicate extension singletons, U extension attributes, and U extension keys. Locale.Builder.setExtension(char, String) should also throw IllformedLocaleException for duplicate U extension attributes and U extension keys.

      Specification

      For Locale.java

      Under the BCP47 extension definition,

        *   <dt><a id="def_extensions"><b>extensions</b></a></dt>
        *
        *   <dd> A map from single character keys to string values, indicating
      - *   extensions apart from language identification.</dd>
      + *   extensions apart from language identification. Keys must not be repeated, unless
      + *   used as values for the key 'x'.</dd>
        *   <dd> <em> BCP 47 deviation:</em> The {@code
        *   extensions} in {@code Locale} implement the semantics and syntax of BCP 47
        *   extension subtags <em>and</em> private use subtags. The {@code extensions}

      Under the BCP47 U-extension definition,

        * can be empty, or a series of subtags 3-8 alphanums in length).  A
        * well-formed locale attribute has the form
        * {@code [0-9a-zA-Z]{3,8}} (it is a single subtag with the same
      - * form as a locale type subtag).
      + * form as a locale type subtag). When duplicate keys or attributes occur, they
      + * are considered ill-formed.
        *
        * <p>The Unicode locale extension specifies optional behavior in
        * locale-sensitive services.  Although the LDML specification defines

      Add RFC 6067 as an external specification,

        *      RFC 4647: Matching of Language Tags
        * @spec https://www.rfc-editor.org/info/rfc5646
        *      RFC 5646: Tags for Identifying Languages
      + * @spec https://www.rfc-editor.org/info/rfc6067
      + *      RFC 6067: BCP 47 Extension U

            jlu Justin Lu
            jlu Justin Lu
            Naoto Sato
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: