Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8369017

Add String Unicode Case-Folding Support

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Unresolved
    • Icon: P3 P3
    • 26
    • core-libs
    • None
    • minimal
    • Java API

      Summary

      Add Unicode standard-compliant case-less comparison methods to the String class, enabling & improving reliable and efficient Unicode-aware/compliant case-insensitive matching.

      • Supports full Unicode compliant full case folding.
      • Provides straightforward, stable and efficient case-less matching without workarounds.
      • Brings Java's string comparison handling in line with other programming languages/libraries.

      Problem

      Case folding is a key operation for case-insensitive matching (e.g., string equality, or regex matching), where the goal is to eliminate case distinctions without applying locale or language specific conversions.

      Currently, the JDK does not expose a direct API for Unicode-compliant case folding. Developers now rely on methods such as:

      String.equalsIgnoreCase(String)

      • Unicode-aware, locale-independent.
      • Implementation uses Character.toLowerCase(Character.toUpperCase(int)) per code point.
      • Limited: does not support 1:M mapping defined in Unicode case folding.

      Character.toLowerCase(int) / Character.toUpperCase(int)

      • Locale-independent, single code point only.
      • No support for 1:M mappings.

      String.toLowerCase(Locale.ROOT) / String.toUpperCase(Locale.ROOT)

      • Based on Unicode SpecialCasing.txt, supports 1:M mappings.
      • Intended primarily for presentation/display, not structural case-insensitive matching.
      • Requires full string conversion before comparison, which is less efficient and not intended for structural matching.

      Example of 1:M Mappings:

      • String.toUpperCase(Locale.ROOT, "ß") → "SS"
      • Case folding produces "ss", matching Unicode caseless comparison rules.
        jshell> "\u00df".equalsIgnoreCase("ss")
        $22 ==> false
        
        jshell> "\u00df".toUpperCase(Locale.ROOT).toLowerCase(Locale.ROOT).equals("ss")
        $24 ==> true

      Solution

      This PR introduces the following comparison APIs

      • boolean equalsFoldCase(String anotherString)
      • int compareToFoldCase(String anotherString)
      • Comparator UNICODE_CASEFOLD_ORDER

      These methods are intended to be the preferred choice when Unicode-compliant case-less matching is required.

      Specification

      /**
       * Compares this {@code String} to another {@code String} for equality,
       * using <em>Unicode case folding</em>. Two strings are considered equal
       * by this method if their case-folded forms are identical.
       * <p>
       * Case folding is defined by the Unicode Standard in
       * <a href="https://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt">CaseFolding.txt</a>,
       * including 1:M mappings. For example, {@code "Maße".equalsFoldCase("MASSE")}
       * returns {@code true}, since the character {@code U+00DF} (sharp s) folds
       * to {@code "ss"}.
       * <p>
       * Case folding is locale-independent and language-neutral, unlike
       * locale-sensitive transformations such as {@link #toLowerCase()} or
       * {@link #toUpperCase()}. It is intended for caseless matching,
       * searching, and indexing.
       *
       * @apiNote
       * This method is the Unicode-compliant alternative to
       * {@link #equalsIgnoreCase(String)}. It implements full case folding as
       * defined by the Unicode Standard, which may differ from the simpler
       * per-character mapping performed by {@code equalsIgnoreCase}.
       * For example:
       * <pre>{@snippet lang=java :
       * String a = "Maße";
       * String b = "MASSE";
       * boolean equalsFoldCase = a.equalsFoldCase(b);       // returns true
       * boolean equalsIgnoreCase = a.equalsIgnoreCase(b);   // returns false
       * }</pre>
       *
       * @param  anotherString
       *         The {@code String} to compare this {@code String} against
       *
       * @return  {@code true} if the given object is not {@code null} and represents
       *          the same sequence of characters as this string under Unicode case
       *          folding; {@code false} otherwise.
       *
       * @see     #compareToFoldCase(String)
       * @see     #equalsIgnoreCase(String)
       * @since   26
       */
      public boolean equalsFoldCase(String anotherString)
      
      /**
       * Compares two strings lexicographically using <em>Unicode case folding</em>.
       * This method returns an integer whose sign is that of calling {@code compareTo}
       * on the Unicode case folded version of the strings. Unicode Case folding
       * eliminates differences in case according to the Unicode Standard, using the
       * mappings defined in
       * <a href="https://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt">CaseFolding.txt</a>,
       * including 1:M mappings, such as {@code"ß"} → {@code }"ss"}.
       * <p>
       * Case folding is a locale-independent, language-neutral form of case mapping,
       * primarily intended for caseless matching. Unlike {@link #compareToIgnoreCase(String)},
       * which applies a simpler locale-insensitive uppercase mapping. This method
       * follows the Unicode <em>full</em> case folding, providing stable and
       * consistent results across all environments.
       * <p>
       * Note that this method does <em>not</em> take locale into account, and may
       * produce results that differ from locale-sensitive ordering. Use
       * {@link java.text.Collator} for locale-sensitive comparison.
       *
       * @apiNote
       * This method is the Unicode-compliant alternative to
       * {@link #compareToIgnoreCase(String)}. It implements the <em>full</em> case folding
       * as defined by the Unicode Standard, which may differ from the simpler
       * per-character mapping performed by {@code compareToIgnoreCase}.
       * For example:
       * <pre>{@snippet lang=java :
       * String a = "Maße";
       * String b = "MASSE";
       * int cmpFoldCase = a.compareToFoldCase(b);     // returns 0
       * int cmpIgnoreCase = a.compareToIgnoreCase(b); // returns > 0
       * }</pre>
       *
       * @param   str   the {@code String} to be compared.
       * @return  a negative integer, zero, or a positive integer as the specified
       *          String is greater than, equal to, or less than this String,
       *          ignoring case considerations by case folding.
       * @see     #equalsFoldCase(String)
       * @see     #compareToIgnoreCase(String)
       * @see     java.text.Collator
       * @since   26
       */
      public int compareToFoldCase(String str) 
      
      /**
       * A Comparator that orders {@code String} objects as by
       * {@link #compareToFoldCase(String) compareToFoldCase()}.
       *
       * @see     #compareToFoldCase(String)
       * @since   26
       */
      public static final Comparator<String> UNICODE_CASEFOLD_ORDER;

      Refs

            sherman Xueming Shen
            sherman Xueming Shen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: