Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8291660 Grapheme support in BreakIterator
  3. JDK-8292992

Release Note: Grapheme Support in BreakIterator

XMLWordPrintable

    • generic
    • generic

      Character boundary analysis in `java.text.BreakIterator` now conforms to Extended Grapheme Clusters breaks defined in <a href="https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries">Unicode Consortium's Standard Annex #29</a>. This change will introduce intentional behavioral changes because the old implementation simply breaks at the code point boundaries for the vast majority of characters. For example, this is a String that contains the US flag and a grapheme for a 4-member-family.
      ```
      "πŸ‡ΊπŸ‡ΈπŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦"
      ```
      This String will be broken into two graphemes with the new implementation:
      ```
      "πŸ‡ΊπŸ‡Έ", "πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦"
      ```
      whereas the old implementation simply breaks at the code point boundaries:
      ```
      "πŸ‡Ί", "πŸ‡Έ", "πŸ‘¨", "(zwj)", "πŸ‘©", "(zwj)", "πŸ‘§", "(zwj)"‍, "πŸ‘¦"
      ```
      where (zwj) denotes ZERO WIDTH JOINER (U+200D).

            naoto Naoto Sato
            naoto Naoto Sato
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: