Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4113835

Some of BreakIterator's rules are not correct in JDK1.1.6G.

XMLWordPrintable

    • 1.1.6
    • generic
    • generic
    • Not verified



        Name: paC48320 Date: 02/20/98


        In JDK1.1.6D--G, new BreakIterator rules are used.
        They are better than JDK1.1.5's.
        But they include downgrade features and
        a little mistake data(it will become better easily).

        1. Sentence break

           a) iterator breaks at newline.
              In this point, JDK1.1.5's iterator is better than JDK1.1.6G's.
              Please test following command.
                  % appletviewer jdk1.1.6/demo/i18n/TextBound/example1.html
              In JDK1.1.5 if newline even was appeared,
              iterator didn't break there(of course.)

              JDK1.1.6G's iterator should not break Sentence at newline.

              I don't understand why ASCII_LINEFEED is 'paragraphBreak',
              and ASCII_CARRIAGE_RETURN is 'sent_cr' in SentenceBreakData.java.
              This is the reason why iterator always breaks Sentence at newline.
              I think both of them should be 'space'.

           b) Japanese fullstop problem.

              In JDK1.1.6G, Japanese full stop (U+3002
              PUNCTUATION_IDEOGRAPHIC_FULL_STOP) breaks Sentence.
              It is great.
              But U+FF0E FULLWIDTH FULL STOP, U+FF01 FULLWIDTH EXCLAMATION MARK,
              and U+FF1F FULLWIDTH QUESTION MARK are often used
              as full stop in Japanese technical papers.
              So please add following code to TextBoundaryData.java.
                 protected static final char FULLWIDTH_FULL_STOP
                     = '\uFF0E';
                 protected static final char FULLWIDTH_EXCLAMATION_MARK
                     = '\uFF01';
                 protected static final char FULLWIDTH_QUESTION_MARK
                     = '\uFF1F';
              And add following code to SentenceBreakData.java.
                 new SpecialMapping(FULLWIDTH_FULL_STOP, ambiguosTerm),
                 new SpecialMapping(FULLWIDTH_EXCLAMATION_MARK, terminator),
                 new SpecialMapping(FULLWIDTH_QUESTION_MARK, terminator),
            
           c) mix Japanese and digit problem.

              In JDK1.1.6G, if next character of digit is CJK,
              iterator breaks Sentence before the CJK character.
              It is not correct.
              Iterator must not break Sentence there.

              Please modify kSentenceForwardData(SentenceBreakData.java) following.
                // 9
                (byte)(SI+1), (byte)(SI+1), (byte)(SI+2), (byte)(SI+9),
                (byte)(SI+1), (byte)(SI+5), (byte)(SI+1), (byte)(SI+4),
                                              ^^^^^^^^^^^^
           d) open problem.

              In JDK1.1.6G, if next character of full stop is open character,
              iterator doesn't break Sentence there.
              I think iterator should break Sentence after the full stop character.

              Please modify kSentenceForwardData(SentenceBreakData.java) following.
                // 2
                SI_STOP, (byte)(SI+3), (byte)(SI+2), (byte)(SI+5),
                SI_STOP, (byte)(SI+2), SI_STOP, (byte)(SI+4),
                ^^^^^^^

        2. Line Break

           a) Line Break becomes very very better than JDK1.1.5.
              Almost Japanese writing rules are included.

              I hope FULLWIDTH '.' ',', '!', '?' information are added.

              Please add following code to LineBreakData.java after above 1.b).
                 new SpecialMapping(FULLWIDTH_FULL_STOP, postJwrd),
                 new SpecialMapping(FULLWIDTH_EXCLAMATION_MARK, postJwrd),
                 new SpecialMapping(FULLWIDTH_QUESTION_MARK, postJwrd),

        3. Word Break
           a) Word Break also becomes very very better than JDK1.1.5.
              Almost Japanese writing rules are included.

              I want to add one information.

              U+3005 IDEOGRAPHIC_ITERATION_MARK is a member of Kanji.

              So please add following code to WordBreakData.java.
                 new SpecialMapping(IDEOGRAPHIC_ITERATION_MARK, kanji),
            
        4. Char Break

           There is no problem.



        I hope these changes will be added until JDK 1.1.6 FCS.
        If so, we (Japanese) can java.text.BreakIterator for real business application.
        (Review ID: 25408)
        ======================================================================

        mircea.oancea@canada 1998-05-08 from incident reported by OKI

        BreakIterator(Sentence) of JDK1.2beta4D has JDK1.1.6G's newline probrem yet.
        BreakIterator of JDK1.2beta4D improved for Japanese because of my bugreport(4113835, 4117554).
        Thanks I18N team.
        But a bug of JDK1.1.6G remains in SentenceBreakData.java of JDK1.2beta4D.
        I think the reason is that the bugreport(4113835) doesn't have 1 a) section.
        The section describes about `iterator breaks at newline.'
        The bug is fixed in JDK1.1.6 FCS.
        But the bug is NOT fixed JDK1.2beta4D.
        Please fix the bug in JDK1.2beta4, too.
        This is a regression.

        The section is following:
        1. Sentence break

           a) iterator breaks at newline.
              In this point, JDK1.1.5's iterator is better than JDK1.1.6G's.
              Please test following command.
                  % appletviewer jdk1.1.6/demo/i18n/TextBound/example1.html
              In JDK1.1.5 if newline even was appeared,
              iterator didn't break there(of course.)

              JDK1.1.6G's iterator should not break Sentence at newline.

              I don't understand why ASCII_LINEFEED is 'paragraphBreak',
              and ASCII_CARRIAGE_RETURN is 'sent_cr' in SentenceBreakData.java.
              This is the reason why iterator always breaks Sentence at newline.
              I think both of them should be 'space'.

              rgillamsunw Richard Gillam (Inactive)
              pallenba Peter Allenbach (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: