-
CSR
-
Resolution: Unresolved
-
P4
-
None
-
source
-
minimal
-
Addition of new functionality.
-
Java API
-
SE
Summary
Support for BCP 47 Extension T - Transformed Content in the JDK.
Problem
java.util.Locale
class implements IETF BCP 47, which is composed of RFC 4647 "Matching of Language Tags" and RFC 5646 "Tags for Identifying Languages". Additionally, BCP 47 defines two extensions. One is the Unicode Locale extension which has been supported in the JDK since JDK7, the other is this Extension T - Transformed Content. This extension provides subtags for specifying the source language or script of transformed content. Transformed content is content that has been transformed, including text that has been transliterated, transcribed, translated, or in some other way influenced by the source locale.
As of JDK19, support for this T extension is not provided. Without the support for this extension in the JDK, users have to parse the extension by themselves.
Solution
The extension defines the semantics of the transformed content, ie, source
language tag and field
s that carry information on the transformation, such as, "transformation mechanism."
This CSR introduces means to create a Locale
object that contains the extension, access those transformed content information within the extension in the Locale
object, and other basic support such as the extension syntax check, equality check, and providing display names.
Specification
In java.util.Locale
class, add a new paragraph in the class description, just after "Unicode locale/language extension" as follows:
+ * <h2><a id="t_extension">Transformed Content - T extension</a></h2>
+ * <a href="https://datatracker.ietf.org/doc/html/rfc6497">RFC 6497</a>
+ * specifies an extension to BCP 47 that provides subtags for specifying
+ * the source language or script of transformed content. Transformed content
+ * is content that has been transformed, including text that has been
+ * transliterated, transcribed, translated, or in some other way
+ * influenced by the source locale. For example,
+ * <table class="striped">
+ * <caption style="display:none">Transformed Content extension examples</caption>
+ * <thead>
+ * <tr><th scope="col">Language Tag</th>
+ * <th scope="col">Description</th></tr>
+ * </thead>
+ * <tbody>
+ * <tr><th scope="row" style="text-align:left">ja-t-it</th>
+ * <td>The content is Japanese, transformed from Italian.</td></tr>
+ * <tr><th scope="row" style="text-align:left">ja-Kana-t-it</th>
+ * <td>The content is Japanese Katakana transformed from Italian.</td></tr>
+ * <tr><th scope="row" style="text-align:left">und-Latn-t-und-cyrl</th>
+ * <td>The content is in the Latin script, transformed from the Cyrillic script.</td></tr>
+ * <tr><th scope="row" style="text-align:left">und-Cyrl-t-und-latn-m0-ungegn-2007</th>
+ * <td>The content is in Cyrillic, transformed from Latin, according to a UNGEGN specification dated 2007.</td></tr>
+ * </tbody>
+ * </table>
+ * <p>The transformed content extension contains an optional well-formed BCP47 {@code source}
+ * language tag followed by zero or more {@code field}s. At least the {@code source} language tag
+ * or one {@code field} must be included. See the example above. Each {@code field} consists of a
+ * field separator (one alpha + one digit), followed by one or more subtags of the length 3 to 8,
+ * each delimited by a hyphen.
+ * <p>The transformed content information, namely {@code source} language tag and {@code fields}
+ * are returned from a {@code Locale} via {@link #getExtension(char)} with
+ * {@link #TRANSFORMED_CONTENT_EXTENSION} which returns the string
+ * representation of the transformed content.
+ * <p>To create a locale object that contains the transformed content extension, either use
+ * the factory method {@link #forLanguageTag(String)} or use
+ * {@link Locale.Builder#setExtension(char, String)} with
+ * {@link #TRANSFORMED_CONTENT_EXTENSION}. Although the Unicode Consortium maintains the valid field
+ * separators and their valid subtags in <a href="http://www.unicode.org/reports/tr35/#BCP47_T_Extension">
+ * 3.7 Unicode BCP 47 T Extension</a>, these methods do not check the validity,
+ * only the well-formed check is done on creating a locale object with the transformed content extension.
+ * <p>For more detail about the transformed content extension, refer to
+ * <a href="https://datatracker.ietf.org/doc/html/rfc6497">
+ * BCP 47 Extension T - Transformed Content</a>
Define a new field for the T
extension as follows:
+ /**
+ * The key for the transformed content extension ('t').
+ *
+ * @see #getExtension(char)
+ * @see Builder#setExtension(char, String)
+ * @since 20
+ */
+ public static final char TRANSFORMED_CONTENT_EXTENSION = 't';
+
Insert the @see tag for the extension in getExtension()
method.
* @throws IllegalArgumentException if key is not well-formed
* @see #PRIVATE_USE_EXTENSION
+ * @see #TRANSFORMED_CONTENT_EXTENSION
* @see #UNICODE_LOCALE_EXTENSION
* @since 1.7
*/
public String getExtension(char key)
In Locale.Builder
class, modify the method description of setExtension()
as follows:
- * <p><b>Note:</b> The key {@link Locale#UNICODE_LOCALE_EXTENSION
+ * @implNote
+ * The key {@link #UNICODE_LOCALE_EXTENSION
* UNICODE_LOCALE_EXTENSION} ('u') is used for the Unicode locale extension.
* Setting a value for this key replaces any existing Unicode locale key/type
* pairs with those defined in the extension.
- *
- * <p><b>Note:</b> The key {@link Locale#PRIVATE_USE_EXTENSION
+ * <p>
+ * The key {@link #PRIVATE_USE_EXTENSION
* PRIVATE_USE_EXTENSION} ('x') is used for the private use code. To be
* well-formed, the value for this key needs only to have subtags of one to
* eight alphanumeric characters, not two to eight as in the general case.
+ * <p>
+ * The key {@link #TRANSFORMED_CONTENT_EXTENSION
+ * TRANSFORMED_CONTENT_EXTENSION} ('t') is used for the transformed content.
+ * The transformed content extension contains an optional well-formed BCP47 {@code source}
+ * language tag followed by zero or more {@code field}s. At least the {@code source} language tag
+ * or one {@code field} must be included. Each {@code field} consists of a
+ * field separator (one alpha + one digit), followed by one or more subtags of
+ * the length 3 to 8, each delimited by a hyphen. For the detailed
+ * specification for the well-formed transformed content extension, refer to
+ * <a href="https://datatracker.ietf.org/doc/html/rfc6497">RFC 6497: BCP 47 Extension
+ * T - Transformed Content</a>.
- csr of
-
JDK-8289227 Support for BCP 47 Extension T - Transformed Content
-
- Open
-