Loading...

XML

Word

Printable

Type: CSR
Resolution: Approved
Priority: P4
Fix Version/s: 20
Component/s: core-libs
Labels:
None

Subcomponent:
java.text
Compatibility Kind:

behavioral
Compatibility Risk:
low
Compatibility Risk Description:
Character breaks now behaves differently. However, those should be the result of the evolution of Unicode's spec, so should not be treated as bugs. See the `Solution` section for more detail.
Interface Kind:

Java API
Scope:
SE

Summary

Enhance the existing java.text.BreakIterator#getCharacterInstance() to support Graphemes

Problem

BreakIterator was designed before Unicode consortium introduced the concept of <code class="prettyprint" data-shared-secret="1755699727308-0.15706099856617062">Grapheme Clusters</code>. The class has been providing getCharacterInstance() method for breaking "characters" (in user's perspective), but it cannot handle the breaks defined in the Grapheme specification.

Solution

Enhance getCharacterInstance() to support Grapheme Clusters. This will introduce intentional behavioral changes because the old implementation simply breaks at the code point boundaries for the vast majority of characters. For example, this is a String that contains the US flag and a grapheme for a 4-member-family.

"🇺🇸👨‍👩‍👧‍👦"

This String will be broken into two graphemes with the new implementation:

"🇺🇸", "👨‍👩‍👧‍👦"

whereas the old implementation simply breaks at the code point boundaries:

"🇺", "🇸", "👨", "(zwj)", "👩", "(zwj)", "👧", "(zwj)"‍, "👦"

where (zwj) denotes ZERO WIDTH JOINER (U+200D).

Specification

Insert the following @implSpec after the character boundary analysis paragraph in the class description of BreakIterator class:

+ * @implSpec The default implementation of the character boundary analysis
+ * conforms to the Unicode Consortium's Extended Grapheme Cluster breaks.
+ * For more detail, refer to
+ * <a href="https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries">
+ * Grapheme Cluster Boundaries</a> section in the Unicode Standard Annex #29.

csr of

JDK-8291660 Grapheme support in BreakIterator

Resolved

relates to

JDK-8294008 Grapheme implementation of setText() throws IndexOutOfBoundsException

Closed

Assignee:: Naoto Sato

Reporter:: Naoto Sato

Reviewed By:: Stuart Marks

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2022-08-15 13:54

Updated:: 2022-09-19 05:57

Resolved:: 2022-08-31 13:55

Details

Description

Summary

Problem

Solution

Specification

Attachments

Issue Links

Activity

People

Dates