Loading...

Type: CSR
Resolution: Approved
Priority: P3
Fix Version/s: 26
Component/s: core-libs
Labels:
None

Subcomponent:
java.lang
Compatibility Kind:

source
Compatibility Risk:
minimal
Compatibility Risk Description:
Adding new public methods, with minimal compatibility risk.
Interface Kind:

Java API
Scope:
SE

Summary

Add Unicode standard-compliant case-less comparison methods to the String class, enabling & improving reliable and efficient Unicode-aware/compliant case-insensitive matching.

Supports Unicode compliant full case folding.
Provides straightforward, stable and efficient case-less matching without workarounds.
Brings Java's string comparison handling in line with other programming languages/libraries.

Problem

Case folding is a key operation for case-insensitive matching (e.g., string equality, or regex matching), where the goal is to eliminate case distinctions without applying locale or language specific conversions.

Currently, the JDK does not expose a direct API for Unicode-compliant case folding. Developers now rely on methods such as:

String.equalsIgnoreCase(String)

Unicode-aware, locale-independent.
Implementation uses Character.toLowerCase(Character.toUpperCase(int)) per code point.
Limited: does not support 1:M mapping defined in Unicode case folding.

Character.toLowerCase(int) / Character.toUpperCase(int)

Locale-independent, single code point only.
No support for 1:M mappings.

String.toLowerCase(Locale.ROOT) / String.toUpperCase(Locale.ROOT)

Based on Unicode SpecialCasing.txt, supports 1:M mappings.
Intended primarily for presentation/display, not structural case-insensitive matching.
Requires full string conversion before comparison, which is less efficient and not intended for structural matching.

Example of 1:M Mappings:

String.toUpperCase(Locale.ROOT, "ß") → "SS"
Case folding produces "ss", matching Unicode caseless comparison rules.

  jshell> "\u00df".equalsIgnoreCase("ss")
$22 ==> false

jshell> "\u00df".toUpperCase(Locale.ROOT).toLowerCase(Locale.ROOT).equals("ss")
$24 ==> true

Solution

This PR introduces the following comparison APIs

boolean equalsFoldCase(String anotherString)
int compareToFoldCase(String anotherString)
Comparator UNICODE_CASEFOLD_ORDER

These methods are intended to be the preferred choice when Unicode-compliant case-less matching is required.

Specification

Also See: String.java.diff

 *
 *  <p><b>String comparison and case-insensitive matching</b>
 *
 * <p>There are several related ways to compare {@code String} values; choose
 * the one whose semantics fit your purpose:
 *
 * <ul>
 *   <li><b>Exact content equality</b> — {@link #equals(Object)} checks that two
 *       strings contain the identical char sequence of UTF-16 code units. This is
 *       a strict, case-sensitive comparison suitable for exact matching, hashing
 *       and any situation that requires bit-for-bit stability.</li>
 *
 *   <li><b>Simple case-insensitive equality</b> — {@link #equalsIgnoreCase(String)}
 *       (and the corresponding {@link #compareToIgnoreCase(String)} and {@link #CASE_INSENSITIVE_ORDER})
 *       performs a per-code-point, locale-independent comparison using
 *       {@link Character#toUpperCase(int)} and {@link Character#toLowerCase(int)}.
 *       It is convenient for many common case-insensitive checks.</li>
 *
 *   <li><b>Unicode case-folded equivalence</b> — {@link #equalsFoldCase(String)}
 *       (and the corresponding {@link #compareToFoldCase(String)} and {@link #UNICODE_CASEFOLD_ORDER})
 *       implement the Unicode <em>{@index "full case folding"}</em> rules defined in
 *       <a href="https://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt">Unicode CaseFolding.txt</a>.
 *       Case folding is locale-independent and language-neutral and may map a single code
 *       point to multiple code points (1:M mappings). For example, the German sharp
 *       s ({@code U+00DF}) is folded to the sequence {@code "ss"}.
 *       Use these methods when you need Unicode-compliant
 *       <a href="https://www.unicode.org/versions/latest/core-spec/chapter-5/#G21790">
 *       caseless matching</a>, searching, or ordering.</li>
 * </ul>
 *
 * <p>Unless otherwise noted, methods for comparing Strings do not take locale into
 * account. The {@link java.text.Collator} class provides methods for finer-grain,
 * locale-sensitive String comparison.
 *

/**
 * A Comparator that orders {@code String} objects as by
 * {@link #compareToFoldCase(String) compareToFoldCase()}.
 *
 * @see     #compareToFoldCase(String)
 * @since   26
 */
public static final Comparator<String> UNICODE_CASEFOLD_ORDER
        = new FoldCaseComparator();


/**
 * Compares this {@code String} to another {@code String} for equality,
 * using <em>{@index "Unicode case folding"}</em>. Two strings are considered equal
 * by this method if their case-folded forms are identical.
 * <p>
 * Case folding is defined by the Unicode Standard in
 * <a href="https://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt">CaseFolding.txt</a>,
 * including 1:M mappings. For example, {@code "Fuß".equalsFoldCase("FUSS")}
 * returns {@code true}, since the character {@code U+00DF} (sharp s) folds
 * to {@code "ss"}.
 * <p>
 * Case folding is locale-independent and language-neutral, unlike
 * locale-sensitive transformations such as {@link #toLowerCase()} or
 * {@link #toUpperCase()}. It is intended for caseless matching,
 * searching, and indexing.
 *
 * @apiNote
 * This method is the Unicode-compliant alternative to
 * {@link #equalsIgnoreCase(String)}. It implements full case folding as
 * defined by the Unicode Standard, which may differ from the simpler
 * per-character mapping performed by {@code equalsIgnoreCase}.
 * For example:
 * <pre>{@snippet lang=java :
 * String a = "Fuß";
 * String b = "FUSS";
 * boolean equalsFoldCase = a.equalsFoldCase(b);       // returns true
 * boolean equalsIgnoreCase = a.equalsIgnoreCase(b);   // returns false
 * }</pre>
 *
 * @param  anotherString
 *         The {@code String} to compare this {@code String} against
 *
 * @return  {@code true} if the given object is not {@code null} and represents
 *          the same sequence of characters as this string under Unicode case
 *          folding; {@code false} otherwise.
 *
 * @spec    https://www.unicode.org/versions/latest/core-spec/chapter-5/#G21790 Unicode Caseless Matching
 * @see     #compareToFoldCase(String)
 * @see     #equalsIgnoreCase(String)
 * @since   26
 */
public boolean equalsFoldCase(String anotherString)

/**
 * Compares two strings lexicographically using <em>{@index "Unicode case folding"}</em>.
 * This method returns an integer whose sign is that of calling {@code compareTo}
 * on the Unicode case folded version of the strings. Unicode Case folding
 * eliminates differences in case according to the Unicode Standard, using the
 * mappings defined in
 * <a href="https://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt">CaseFolding.txt</a>,
 * including 1:M mappings, such as {@code"ß"} → {@code }"ss"}.
 * <p>
 * Case folding is a locale-independent, language-neutral form of case mapping,
 * primarily intended for caseless matching. Unlike {@link #compareToIgnoreCase(String)},
 * which applies a simpler locale-insensitive uppercase mapping. This method
 * follows the Unicode <em>{@index "full"}</em> case folding, providing stable and
 * consistent results across all environments.
 * <p>
 * Note that this method does <em>not</em> take locale into account, and may
 * produce results that differ from locale-sensitive ordering. Use
 * {@link java.text.Collator} for locale-sensitive comparison.
 *
 * @apiNote
 * This method is the Unicode-compliant alternative to
 * {@link #compareToIgnoreCase(String)}. It implements the
 * <em>{@index "full case folding"}</em> as defined by the Unicode Standard, which
 * may differ from the simpler per-character mapping performed by
 * {@code compareToIgnoreCase}.
 * For example:
 * <pre>{@snippet lang=java :
 * String a = "Fuß";
 * String b = "FUSS";
 * int cmpFoldCase = a.compareToFoldCase(b);     // returns 0
 * int cmpIgnoreCase = a.compareToIgnoreCase(b); // returns > 0
 * }</pre>
 *
 * @param   str   the {@code String} to be compared.
 * @return  a negative integer, zero, or a positive integer as the specified
 *          String is greater than, equal to, or less than this String,
 *          ignoring case considerations by case folding.
 *
 * @spec    https://www.unicode.org/versions/latest/core-spec/chapter-5/#G21790 Unicode Caseless Matching
 * @see     java.text.Collator
 * @see     #compareToIgnoreCase(String)
 * @see     #equalsFoldCase(String)
 * @since   26
 */
public int compareToFoldCase(String str) 

/**
 * A Comparator that orders {@code String} objects as by
 * {@link #compareToFoldCase(String) compareToFoldCase()}.
 *
 * @see     #compareToFoldCase(String)
 * @since   26
 */
public static final Comparator<String> UNICODE_CASEFOLD_ORDER;

Refs

csr of

JDK-8365675 Add String Unicode Case-Folding Support

Resolved

Details

Description

Summary

Problem

Solution

Specification

Refs

Attachments

Issue Links

Activity

People

Dates