Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8279542

Upgrade Unicode Data Files to 14.0.0

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P3 P3
    • 19
    • core-libs
    • None
    • source
    • low
    • Unicode keeps backward compatible, thus JDK adopting it would not expect any backward compatibility issues.
    • Java API
    • SE

      Summary

      Support the Unicode Standard version 14.0.0 in the JDK.

      Problem

      Keeping up with the latest Unicode Standard is imperative. Otherwise, interoperability with other platforms would be problematic.

      Solution

      Incorporate Unicode 14.0 that assigned 838 characters, 12 new blocks, and 5 new scripts since Unicode 13.0. Detailed changes are described in the Unicode Consortium's 14.0 website.

      java.text.Bidi and java.text.Normalizer classes will be upgraded to 14.0 level of Unicode Annex #9 and #15, respectively.

      Support for the Unicode extended grapheme clusters in java.util.regex.Pattern will be upgraded to 14.0 level of the Unicode Annex #29 "Unicode Text Segmentation."

      For more specific delta charts, refer to Unicode.org's delta page

      Specification

      Change the class description in the java.lang.Character class as:

      @@ -61,11 +61,11 @@
        * This file specifies properties including name and category for every
        * assigned Unicode code point or character range. The file is available
        * from the Unicode Consortium at
        * <a href="http://www.unicode.org">http://www.unicode.org</a>.
        * <p>
      - * Character information is based on the Unicode Standard, version 13.0.
      + * Character information is based on the Unicode Standard, version 14.0.
        * <p>
        * The Java platform has supported different versions of the Unicode
        * Standard over time. Upgrades to newer versions of the Unicode Standard
        * occurred in the following Java releases, each indicating the new version:
        * <table class="striped">
      @@ -73,10 +73,12 @@
        * <thead>
        * <tr><th scope="col">Java release</th>
        *     <th scope="col">Unicode version</th></tr>
        * </thead>
        * <tbody>
      + * <tr><th scope="row" style="text-align:left">Java SE 19</th>
      + *     <td>Unicode 14.0</td></tr>
        * <tr><th scope="row" style="text-align:left">Java SE 15</th>
        *     <td>Unicode 13.0</td></tr>
        * <tr><th scope="row" style="text-align:left">Java SE 13</th>
        *     <td>Unicode 12.1</td></tr>
        * <tr><th scope="row" style="text-align:left">Java SE 12</th>

      In java.lang.Character.UnicodeBlock class, add the following new fields:

       /**
        * Constant for the "Arabic Extended-B" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock ARABIC_EXTENDED_B
      
       /**
        * Constant for the "Vithkuqi" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock VITHKUQI
      
       /**
        * Constant for the "Latin Extended-F" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock LATIN_EXTENDED_F
      
       /**
        * Constant for the "Old Uyghur" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock OLD_UYGHUR
      
       /**
        * Constant for the "Unified Canadian Aboriginal Syllabics Extended-A" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED_A
      
       /**
        * Constant for the "Cypro-Minoan" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock CYPRO_MINOAN
      
       /**
        * Constant for the "Tangsa" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock TANGSA
      
       /**
        * Constant for the "Kana Extended-B" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock KANA_EXTENDED_B
      
       /**
        * Constant for the "Znamenny Musical Notation" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock ZNAMENNY_MUSICAL_NOTATION
      
       /**
        * Constant for the "Latin Extended-G" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock LATIN_EXTENDED_G
      
       /**
        * Constant for the "Toto" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock TOTO
      
       /**
        * Constant for the "Ethiopic Extended-B" Unicode
        * character block.
        * @since 19
        */
       public static final UnicodeBlock ETHIOPIC_EXTENDED_B

      In java.lang.Character.UnicodeScript enum, add the following new fields:

       /**
        * Unicode script "Vithkuqi".
        * @since 19
        */
       VITHKUQI,
      
       /**
        * Unicode script "Old Uyghur".
        * @since 19
        */
       OLD_UYGHUR,
      
       /**
        * Unicode script "Cypro Minoan".
        * @since 19
        */
       CYPRO_MINOAN,
      
       /**
        * Unicode script "Tangsa".
        * @since 19
        */
       TANGSA,
      
       /**
        * Unicode script "Toto".
        * @since 19
        */
       TOTO

            naoto Naoto Sato
            naoto Naoto Sato
            Joe Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: