Name: clC74495 Date: 02/24/99
BreakIterator returns wrong word boundaries for the Japanese words
which include the following characters:
U+309D HIRAGANA ITERATION MARK
U+309E HIRAGANA VOICED ITERATION MARK
U+30FD KATAKANA ITERATION MARK
U+30FE KATAKANA VOICED ITERATION MARK
U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK
This problem occurs because the java.text.WordBreakData lacks
these character mappings. If you add the following differences,
it will work well.
*** WordBreakData.java.old Mon Feb 1 20:06:33 1999
--- WordBreakData.java Mon Feb 1 20:11:31 1999
***************
*** 314,321 ****
--- 314,327 ----
new SpecialMapping(HIRAGANA_LETTER_SMALL_A, HIRAGANA_LETTER_VU, hira),
new SpecialMapping(COMBINING_KATAKANA_HIRAGANA_VOICED_SOUND_MARK,
HIRAGANA_SEMIVOICED_SOUND_MARK, diacrit),
+ new SpecialMapping(HIRAGANA_ITERATION_MARK,
+ HIRAGANA_VOICED_ITERATION_MARK, hira),
new SpecialMapping(KATAKANA_LETTER_SMALL_A,
KATAKANA_LETTER_SMALL_KE, kata),
+ new SpecialMapping(KATAKANA_HIRAGANA_PROLONGED_SOUND_MARK,
+ diacrit),
+ new SpecialMapping(KATAKANA_ITERATION_MARK,
+ KATAKANA_VOICED_ITERATION_MARK, kata),
new SpecialMapping(UNICODE_LOW_BOUND_HAN,
UNICODE_HIGH_BOUND_HAN, kanji),
new SpecialMapping(HANGUL_SYL_LOW, HANGUL_SYL_HIGH, letter),
(Review ID: 54644)
======================================================================