Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6868503

RuleBasedBreakIterator is inefficient

XMLWordPrintable

    • b02
    • generic
    • generic
    • Verified

        LineBreakMeasurer essentially uses following logic to split text into the lines:
           a) find what is the last char that fits into the line based on the metrics
           b) find last work break before this char

        For step b BreakIterator is used (RuleBasedBreakIterator in my case) and i observe that preceeding() and following() methods are called very often for attached benchmark.

        The reason is RuleBasedBreakIterator looks for previous work break by scanning forward from last known
        "\n" occurence. For long text strings without explicit line breaks and which has to be broken into short lines this is very inefficient. Performance improves dramatically if we can cache and use word break found on previous iteration. See suggested change in the attachment (you may want to refine it, e.g. reset cache if text was replaced, etc.)

        With this fix benchmark time drops from 500ms to 20ms on my PC.
        It also saves about 30% time of LineBreakMeasurer time (for original FX testcase).

        Similar problems may be applicable to other iterator implementations (and may show up in different locales).

              peytoia Yuka Kamiya (Inactive)
              igor Igor Nekrestyanov (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: