JDK's text segmentation API, BreakIterator has its own text segmentation rule (sun.text.resources.BreakIteratorRules.java), and has not been updated. OTOH, Unicode's UAX #14/#29 have constantly been updated. JDK's implementation should catch them up.
- relates to
-
JDK-8309565 [Text] Enhance support for user-perceived characters (grapheme clusters)
- Open
-
JDK-8291660 Grapheme support in BreakIterator
- Resolved