Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4777313

Unicode 3.2 - based line-wraps in Swing

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 1.4.1
    • client-libs



      Name: gm110360 Date: 11/11/2002


      FULL PRODUCT VERSION :
      java version "1.4.1"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
      Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)

      A DESCRIPTION OF THE PROBLEM :
      For internationalization, and also to allow writing more
      locale-independant GUI interfaces in Swing, that easily be
      translated with a single source code, and a simple set of
      resource bundles, we need something to allow correct
      handling of two required features:
      - Word-wrap
      - Line Wrap
      - directionality

      Unicode 3.2 publishes a set of character properties related
      to:
      - character width (half-width/full-width)
      - reorientability of half-width characters in a vertical
      layout, or their conversion from half-width to full width
      - Unicode canonicalization rules (related to combining
      marks), and management of presentation forms (contextual
      forms for characters, styling cobined to some
      characters, ...)
      - line wrap attributes for characters
      - technical reports with sample code snippets to handle
      these new character properties

      The most common problem with internationalized
      applications, after directionality in Hebrew and Arabic, is
      the management of linewraps: this directly affect Asian
      texts, which don't use any space to allow simple line-wrap
      or word-wrap when creating a layout to display the text.

      A "simple" solution would be to expect that Asian text will
      contain spaces. This is true if the set of resources to
      displayed is fixed and managed in static resources,
      however it is not correct according to the standard layout
      of these languages. To solve this problem, a program should
      be able to detect some characters that can help performing
      linewraps correctly:
      - full-width punctuation used in Chinese or Japanese are
      mostly equivalent to half-width punctuation and a space
      - Chinese Hanzi and Japanese Kanji characters are
      considered as words alone, that can be wrapped individually
      - Japanese Katakatana, Hiragana have some rules to
      delimitate syllables or terms that can be wrapped
      individually
      - Korean Hangul characters are composed in syllables that
      can be computed algorithmically (the L,V*,T algorithm):
      line-wrap can occur between syllables but not in the middle
      of a LVT syllable sequence.
      - There's generally no need to support a vertical layout
      for Asian languages, as they also accept the horizotal
      layout (the biggest layout problem comes from Semitic
      languages)
      - Latin-, Greek- or Cyrillic-based scripts usually have
      short enough words to allow a simple wrap algorithm based
      on word-wrap without needing hyphenation (and
      dictionnaries) if the GUI is correctly designed with a
      sufficient display width, and they use the usual
      punctuations and spaces to delimit words
      - Generally, a change of script delimits a line-wrap
      opportunity (for example between Latin and Higagana, or
      between Hiragana and Katakana, or between Hira/kata and
      Hanzi/Kanji...)
      - Unicode provides anefficient algorithm to handle the
      linewrap opportunities based on pairs of character classes
      that will work very well with simple scripts

      Is it possible to add new classes in the
      java.lang.Character family to handle the now standardized
      new properties for characters:
      - east asian width
      - derived normalization
      - linewrap opportunities classes
      - case folding
      - special casing
      in a similar way that is now implemented with the
      java.lang.UnicodeBlock class ?

      Then to proide new APIs for Swing that would use these
      properties to allow parsing a string into wrappable tokens
      or to proide common transformations of strings to comply
      with a text layout manager?

      The most important changes will be in the way text is
      handled in HTML renderers, and in JTextArea

      Designing an interface that complies with these rules
      should be the first goal, and there should be simple
      implementations that will work on all important scripts
      supported now by Java: Latin, Cyrillic, Greek, Hebrew,
      Hiragana, Katakana, Hanzi/Kanji, Hangul, Thai

      There also should be support now for Vietnamese, which is
      not really a complex script (VISCII does not fully comply
      with ISO-8859 rules as it uses some ASCII control bytes to
      represent a few accented latin characters but it still
      works as a common single-byte encoding; alternatives use
      combining marks and the most commonly used character set is
      windows-1258 using those combining marks and extending an
      ISO registered character set with some characters commonly
      found on all Windows ANSI character sets).


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Take a typical application in Java using simple
      MessagesBundles to internationalize their GUI. Give these
      bundles to translate to native translators.
      Try to use the translations and look at the poor layout or
      inaccessible buttons or part of the text in the GUI. This
      is caused by the lack of support of Asian text in Java...
      The developer must manually check the translation to insert
      a few spaces to help manage the multiline layout.
      There's no support in Java to help the developer make it a
      better way...
      So correct internationlization from European to Asian
      languages causes a lot of unsolved issues that can make an
      application unusable in some cases with Asian text.

      REPRODUCIBILITY :
      This bug can be reproduced always.
      (Review ID: 165231)
      ======================================================================

            peterz Peter Zhelezniakov
            gmanwanisunw Girish Manwani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: