Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4095322

CJK line-breaking not completely correct

XMLWordPrintable

    • 1.1.6
    • x86
    • windows_95
    • Verified



        Name: bb33257 Date: 11/25/97


        In addition to the problems reported in bug #4068133, there are
        quite a few characters that the BreakIterator returned by
        BreakIterator.getLineInstance() doesn't treat correctly in the
        presence of CJK ideographs. To see the problem, run the
        following code:

        public void TestJapaneseLineBreak()
        {
            StringBuffer testString = new StringBuffer("\u4e00x\u4e8c");
            String precedingChars = "([{«$¥£¤\u2018\u201a\u201c\u201e\u201b\u201f";
            String followingChars = ")]}»!%,.\u3001\u3002\u3063\u3083\u3085\u3087\u30c3\u30e3\u30e5\u30e7\u30fc:;\u309b\u309c\u3005\u309d\u309e\u30fd\u30fe\u2019\u201d\u00b0\u2032\u2033\u2034\u2030\u2031\u2103\u2109\u00a2\u0300\u0301\u0302";
            BreakIterator iter = BreakIterator.getLineInstance(Locale.JAPAN);

            for (int i = 0; i < precedingChars.length(); i++) {
                testString.setCharAt(1, precedingChars.charAt(i));
                iter.setText(testString.toString());
                int j = iter.first();
                if (j != 0)
                    errln("ja line break failure: failed to start at 0");
                j = iter.next();
                if (j != 1)
                    errln("ja line break failure: failed to stop before '" + precedingChars.charAt(i)
                                + "' (" + ((int)(precedingChars.charAt(i))) + ")");
                j = iter.next();
                if (j != 3)
                    errln("ja line break failure: failed to skip position after '" + precedingChars.charAt(i)
                                + "' (" + ((int)(precedingChars.charAt(i))) + ")");
            }

            for (int i = 0; i < followingChars.length(); i++) {
                testString.setCharAt(1, followingChars.charAt(i));
                iter.setText(testString.toString());
                int j = iter.first();
                if (j != 0)
                    errln("ja line break failure: failed to start at 0");
                j = iter.next();
                if (j != 2)
                    errln("ja line break failure: failed to skip position before '" + followingChars.charAt(i)
                                + "' (" + ((int)(followingChars.charAt(i))) + ")");
                j = iter.next();
                if (j != 3)
                    errln("ja line break failure: failed to stop after '" + followingChars.charAt(i)
                                + "' (" + ((int)(followingChars.charAt(i))) + ")");
            }
        }

        The following "preceding" characters don't get treated correctly:

        \u0024 The ASCII dollar sign
        \u00a3 The British pound sign
        \u00a4 The generic currency symbol
        \u00a5 The yen sign

        The following "following" characters don't get treated correctly:

        \u3063, \u3083, \u3085, \u3087, \u30c3, \u30e3, \u30e5, \u30e7 The small Kana characters
        \u30fc The Katakana long-vowel mark
        \u309b, \u309c The Kana voiced/semi-voiced sound marks
        \u3005, \u309d, \u309e, \u30fd, \u30fe The CJK iteration marks
        \u00b0 The degree sign
        \u2032, \u2033, \u2034 Prime marks
        \u2103, \u2019 Degrees Celsius and degrees Fahrenheit
        \u00a2 The cents sign
        \u0300, \u0301, \u0302 All non-spacing marks
        ======================================================================

              busersunw Btplusnull User (Inactive)
              bcbeck Brian Beck (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: