Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8370255

Locale should mention the behavior for duplicate subtags

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 26
    • core-libs
    • None
    • behavioral
    • minimal
    • A specification update which makes the existing (and correct) behavior regarding duplicate BCP47 subtag components apparent.
    • Java API
    • SE

      Summary

      Describe the behavior for duplicate BCP 47 subtags and U extension keys and attributes in the Locale class description and relevant methods.

      Problem

      Locale implements IETF BCP 47, which is the specification for a language tag. "Variant" and "extension" are specific subtag field values within a BCP 47 tag.

      BCP47 carries two levels of conformance: "valid" and "well-formed". Locale APIs make apparent the behavior when a tag is ill-formed, but duplicates (which are well-formed but invalid) have ambiguous behavior. While Locale does not enforce the "valid" level of conformance, it is unclear what the behavior of duplicates is, when they occur.

      Solution

      Duplicate variants are accepted and included. Duplicate extension singleton keys (and their associated value) are accepted, but ignored. The same applies to duplicate keys and attributes within a U extension.

      Make apparent in the class specification the distinction between the two levels of conformance. Describe in the relevant sections and methods the behavior for duplicates. There are no plans to enforce the "valid" level of conformance, so a specification update to align the existing behavior is appropriate.

      Specification

      Update the wording under the "well-formed" section for BCP47 tags to describe duplicate variants and extensions. Introduces the two levels of conformance.

      - * <b>BCP 47 deviation:</b> Although BCP 47 requires field values to be registered
      - * in the IANA Language Subtag Registry, the {@code Locale} class
      - * does not validate this requirement. For example, the variant code <em>"foobar"</em>
      - * is well-formed since it is composed of 5 to 8 alphanumerics, but is not defined
      - * the IANA Language Subtag Registry. The {@link Builder}
      - * only checks if an individual field satisfies the syntactic
      - * requirement (is well-formed), but does not validate the value
      - * itself. Conversely, {@link #of(String, String, String) Locale::of} and its
      - * overloads do not make any syntactic checks on the input.
      + * <b>BCP 47 deviation:</b> BCP47 defines the following two levels of
      + * <a href="https://datatracker.ietf.org/doc/html/rfc5646#section-2.2.9">conformance</a>,
      + * "valid" and "well-formed". A valid tag requires that it is well-formed, its
      + * subtag values are registered in the IANA Language Subtag Registry, and it does not
      + * contain duplicate variant or extension singleton subtags. The {@code Locale}
      + * class does not enforce that subtags are registered in the Subtag Registry.
      + * {@link Builder} only checks if an individual field satisfies the syntactic
      + * requirement (is well-formed). When passed duplicate variants, {@code Builder}
      + * accepts and includes them. When passed duplicate extension singletons, {@code
      + * Builder} accepts but ignores the duplicate key and its associated value.
      + * Conversely, {@link #of(String, String, String) Locale::of} and its
      + * overloads do not check if the input is well-formed at all.

      Update the wording under the "well-formed" section for U extension to describe duplicate keys and extensions,

      - * form as a locale type subtag).
      + * form as a locale type subtag). Duplicate locale attributes as well
      + * as locale keys do not convey meaning. For methods in {@code Locale} and
      + * {@code Locale.Builder} that accept extensions, occurrences of duplicate
      + * locale attributes as well as locale keys and their associated type are accepted
      + * but ignored.

      Include RFC 6067 as an external specification,

      + * @spec https://www.rfc-editor.org/info/rfc6067
      + *      RFC 6067: BCP 47 Extension U

      Update Locale.forLanguageTag(String),

      +     * <p>Duplicate variants are accepted and included by the builder.
      +     * However, duplicate extension singleton keys and their associated type
      +     * are accepted but ignored. The same behavior applies to duplicate locale
      +     * keys and attributes within a U extension. Note that subsequent subtags after
      +     * the occurrence of a duplicate are not ignored.

      Update Locale.Builder.setLanguageTag(String),

      +         * <p>Duplicate variants are accepted and included by the builder.
      +         * However, duplicate extension singleton keys and their associated type
      +         * are accepted but ignored. The same behavior applies to duplicate locale
      +         * keys and attributes within a U extension. Note that subsequent subtags after
      +         * the occurrence of a duplicate are not ignored.

      Update Locale.Builder.setVariant(String),

      -         * subtags, or an exception is thrown.
      +         * subtags, or an exception is thrown. Duplicate variants are
      +         * accepted and included by the builder.

      Update Locale.Builder.setExtension(char, String),

      -         * pairs with those defined in the extension.
      +         * pairs with those defined in the extension. Duplicate locale attributes
      +         * as well as locale keys and their associated type are accepted but ignored.

            jlu Justin Lu
            jlu Justin Lu
            Naoto Sato
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: