Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8305623

Introduce a method in Locale class to return the language tags as per RFC 5646 convention

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 21
    • core-libs
    • None
    • source
    • minimal
    • This CSR proposes introducing a new method, which should not effect any existing methods.
    • Java API
    • SE

      Summary

      Add the method Locale.caseFoldLanguageTag(String languageTag) which formats the language tag to adhere to section 2.1.1. Formatting of Language Tags of RFC 5646.

      Problem

      Currently the JDK does not provide any method which formats a language tag to the recommended case convention of section 2.1.1. Formatting of Language Tags of RFC 5646. Although not mandatory, it is highly recommended by RFC 5646 that users follow their format, as it generally corresponds to various ISO standards such as ISO639-1, ISO15924, and ISO3166-1.

      As a hack, users have been calling Locale.forLanguageTag("MN-cYRL-mn").toLanguageTag() to replicate the formatting. Calling the chained methods provides no guarantee that the output string is the same as the input string. For example, Locale.forLanguageTag("ja-JP-x-lvariant-JP").toLanguageTag(); returns "ja-JP-u-ca-japanese-x-lvariant-JP". It would be ideal if there was a method whose sole purpose was achieving the recommended case convention, without any potential alternation of the language tag itself.

      Solution

      Introduce a method which simply converts a valid tag to the recommended case convention of RFC5646 2.1.1. This is specified as: "An implementation can reproduce this format without accessing the registry as follows. All subtags, including extension and private use subtags, use lowercase letters with two exceptions: two-letter and four-letter subtags that neither appear at the start of the tag nor occur after singletons". Similar to Locale.forLanguageTag(String tag), variant subtags will not be case normalized.

      Specification

      /**
       * {@return a case folded IETF BCP 47 language tag}
       *
       * <p>This method formats a language tag into one with case convention
       * that adheres to section 2.1.1. Formatting of Language Tags of RFC5646.
       * This format is defined as: <i>All subtags, including extension and private
       * use subtags, use lowercase letters with two exceptions: two-letter
       * and four-letter subtags that neither appear at the start of the tag
       * nor occur after singletons. Such two-letter subtags are all
       * uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four-
       * letter subtags are titlecase (as in the tag "az-Latn-x-latn").</i> As
       * grandfathered tags are not always well-formed, this method
       * will simply case fold a grandfathered tag to match the exact case convention
       * for the particular tag specified in the respective
       * {@link ##legacy_tags_replacement Legacy tags} table.
       *
       * <p><b>Special Exceptions</b>
       * <p>To maintain consistency with {@link ##def_variant variant}
       * which is case-sensitive, this method will neither case fold variant
       * subtags nor case fold private use subtags prefixed by {@code lvariant}.
       *
       * <p>For example,
       * {@snippet lang=java :
       * String tag = "ja-kana-jp-x-lvariant-Oracle-JDK-Standard-Edition";
       * Locale.caseFoldLanguageTag(tag); // returns "ja-Kana-JP-x-lvariant-Oracle-JDK-Standard-Edition"
       * String tag2 = "ja-kana-jp-x-Oracle-JDK-Standard-Edition";
       * Locale.caseFoldLanguageTag(tag2); // returns "ja-Kana-JP-x-oracle-jdk-standard-edition"
       * }
       *
       * <p>Excluding case folding, this method makes no modifications to the tag itself.
       * Case convention of language tags does not carry meaning, and is simply
       * recommended as it corresponds with various ISO standards, including:
       * ISO639-1, ISO15924, and ISO3166-1.
       *
       * <p>As the formatting of the case convention is dependent on the
       * positioning of certain subtags, callers of this method should ensure
       * that the language tag is well-formed, (conforming to section 2.1. Syntax
       * of RFC5646).
       *
       * @param languageTag the IETF BCP 47 language tag.
       * @throws IllformedLocaleException if {@code languageTag} is not well-formed
       * @throws NullPointerException if {@code languageTag} is {@code null}
       * @spec https://www.rfc-editor.org/rfc/rfc5646.html#section-2.1
       *       RFC5646 2.1. Syntax
       * @spec https://www.rfc-editor.org/rfc/rfc5646#section-2.1.1
       *       RFC5646 2.1.1. Formatting of Language Tags
       * @since 21
       */
      public static String caseFoldLanguageTag(String languageTag) {
          return LanguageTag.caseFoldTag(languageTag);
      }

            jlu Justin Lu
            nishjain Nishit Jain
            Naoto Sato
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: