Summary
Add Unicode standard-compliant case-less comparison methods to the String class, enabling & improving reliable and efficient Unicode-aware/compliant case-insensitive matching.
- Supports full Unicode compliant full case folding.
- Provides straightforward, stable and efficient case-less matching without workarounds.
- Brings Java's string comparison handling in line with other programming languages/libraries.
Problem
Case folding is a key operation for case-insensitive matching (e.g., string equality, or regex matching), where the goal is to eliminate case distinctions without applying locale or language specific conversions.
Currently, the JDK does not expose a direct API for Unicode-compliant case folding. Developers now rely on methods such as:
String.equalsIgnoreCase(String)
- Unicode-aware, locale-independent.
- Implementation uses Character.toLowerCase(Character.toUpperCase(int)) per code point.
- Limited: does not support 1:M mapping defined in Unicode case folding.
Character.toLowerCase(int) / Character.toUpperCase(int)
- Locale-independent, single code point only.
- No support for 1:M mappings.
String.toLowerCase(Locale.ROOT) / String.toUpperCase(Locale.ROOT)
- Based on Unicode SpecialCasing.txt, supports 1:M mappings.
- Intended primarily for presentation/display, not structural case-insensitive matching.
- Requires full string conversion before comparison, which is less efficient and not intended for structural matching.
Example of 1:M Mappings:
- String.toUpperCase(Locale.ROOT, "ß") → "SS"
- Case folding produces "ss", matching Unicode caseless comparison rules.
jshell> "\u00df".equalsIgnoreCase("ss") $22 ==> false
jshell> "\u00df".toUpperCase(Locale.ROOT).toLowerCase(Locale.ROOT).equals("ss") $24 ==> true
Solution
This PR introduces the following comparison APIs
- boolean equalsFoldCase(String anotherString)
- int compareToFoldCase(String anotherString)
- Comparator UNICODE_CASEFOLD_ORDER
These methods are intended to be the preferred choice when Unicode-compliant case-less matching is required.
Specification
Also See: String.java.diff
*
* <p><b>String comparison and case-insensitive matching</b>
*
* <p>There are several related ways to compare {@code String} values; choose
* the one whose semantics fit your purpose:
*
* <ul>
* <li><b>Exact content equality</b> — {@link #equals(Object)} checks that two
* strings contain the identical char sequence of UTF-16 code units. This is
* a strict, case-sensitive comparison suitable for exact matching, hashing
* and any situation that requires bit-for-bit stability.</li>
*
* <li><b>Simple case-insensitive equality</b> — {@link #equalsIgnoreCase(String)}
* (and the corresponding {@link #compareToIgnoreCase(String)} and {@link #CASE_INSENSITIVE_ORDER})
* performs a per-code-point, locale-independent comparison using
* {@link Character#toUpperCase(int)} and {@link Character#toLowerCase(int)}.
* It is convenient for many common case-insensitive checks.</li>
*
* <li><b>Unicode case-folded equivalence</b> — {@link #equalsFoldCase(String)}
* (and the corresponding {@link #compareToFoldCase(String)} and {@link #UNICODE_CASEFOLD_ORDER})
* implement the Unicode <em>{@index "full case folding"}</em> rules defined in
* <a href="https://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt">Unicode CaseFolding.txt</a>.
* Case folding is locale-independent and language-neutral and may map a single code
* point to multiple code points (1:M mappings). For example, the German sharp
* s ({@code U+00DF}) is folded to the sequence {@code "ss"}.
* Use these methods when you need Unicode-compliant caseless matching,
* searching, or ordering.</li>
* </ul>
*
* <p>Unless otherwise noted, methods for comparing Strings do not take locale into
* account. The {@link java.text.Collator} class provides methods for finer-grain,
* locale-sensitive String comparison.
*
/**
* A Comparator that orders {@code String} objects as by
* {@link #compareToFoldCase(String) compareToFoldCase()}.
*
* @see #compareToFoldCase(String)
* @since 26
*/
public static final Comparator<String> UNICODE_CASEFOLD_ORDER
= new FoldCaseComparator();
/**
* Compares this {@code String} to another {@code String} for equality,
* using <em>{@index "Unicode case folding"}</em>. Two strings are considered equal
* by this method if their case-folded forms are identical.
* <p>
* Case folding is defined by the Unicode Standard in
* <a href="https://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt">CaseFolding.txt</a>,
* including 1:M mappings. For example, {@code "Fuß".equalsFoldCase("FUSS")}
* returns {@code true}, since the character {@code U+00DF} (sharp s) folds
* to {@code "ss"}.
* <p>
* Case folding is locale-independent and language-neutral, unlike
* locale-sensitive transformations such as {@link #toLowerCase()} or
* {@link #toUpperCase()}. It is intended for caseless matching,
* searching, and indexing.
*
* @apiNote
* This method is the Unicode-compliant alternative to
* {@link #equalsIgnoreCase(String)}. It implements full case folding as
* defined by the Unicode Standard, which may differ from the simpler
* per-character mapping performed by {@code equalsIgnoreCase}.
* For example:
* <pre>{@snippet lang=java :
* String a = "Fuß";
* String b = "FUSS";
* boolean equalsFoldCase = a.equalsFoldCase(b); // returns true
* boolean equalsIgnoreCase = a.equalsIgnoreCase(b); // returns false
* }</pre>
*
* @param anotherString
* The {@code String} to compare this {@code String} against
*
* @return {@code true} if the given object is not {@code null} and represents
* the same sequence of characters as this string under Unicode case
* folding; {@code false} otherwise.
*
* @see #compareToFoldCase(String)
* @see #equalsIgnoreCase(String)
* @since 26
*/
public boolean equalsFoldCase(String anotherString)
/**
* Compares two strings lexicographically using <em>{@index "Unicode case folding"}</em>.
* This method returns an integer whose sign is that of calling {@code compareTo}
* on the Unicode case folded version of the strings. Unicode Case folding
* eliminates differences in case according to the Unicode Standard, using the
* mappings defined in
* <a href="https://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt">CaseFolding.txt</a>,
* including 1:M mappings, such as {@code"ß"} → {@code }"ss"}.
* <p>
* Case folding is a locale-independent, language-neutral form of case mapping,
* primarily intended for caseless matching. Unlike {@link #compareToIgnoreCase(String)},
* which applies a simpler locale-insensitive uppercase mapping. This method
* follows the Unicode <em>{@index "full"}</em> case folding, providing stable and
* consistent results across all environments.
* <p>
* Note that this method does <em>not</em> take locale into account, and may
* produce results that differ from locale-sensitive ordering. Use
* {@link java.text.Collator} for locale-sensitive comparison.
*
* @apiNote
* This method is the Unicode-compliant alternative to
* {@link #compareToIgnoreCase(String)}. It implements the
* <em>{@index "full case folding"}</em> as defined by the Unicode Standard, which
* may differ from the simpler per-character mapping performed by
* {@code compareToIgnoreCase}.
* For example:
* <pre>{@snippet lang=java :
* String a = "Fuß";
* String b = "FUSS";
* int cmpFoldCase = a.compareToFoldCase(b); // returns 0
* int cmpIgnoreCase = a.compareToIgnoreCase(b); // returns > 0
* }</pre>
*
* @param str the {@code String} to be compared.
* @return a negative integer, zero, or a positive integer as the specified
* String is greater than, equal to, or less than this String,
* ignoring case considerations by case folding.
* @see java.text.Collator
* @see #compareToIgnoreCase(String)
* @see #equalsFoldCase(String)
* @since 26
*/
public int compareToFoldCase(String str)
/**
* A Comparator that orders {@code String} objects as by
* {@link #compareToFoldCase(String) compareToFoldCase()}.
*
* @see #compareToFoldCase(String)
* @since 26
*/
public static final Comparator<String> UNICODE_CASEFOLD_ORDER;
Refs
- csr of
-
JDK-8365675 Add String Unicode Case-Folding Support
-
- In Progress
-