Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8215943

Allowing additional currency code points from later Unicode updates (11uX Backport)

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Withdrawn
    • Icon: P2 P2
    • 11-pool
    • core-libs
    • None
    • minimal
    • Hide
      Since this is only relaxing the spec, no compatibility risk would be expected. The risk for the behavioral change in Character.isJavaIdentifierStart/Part() is also minimal for the following reasons:
      - The Currency Symbols range is very limited (U+20A0-U+20CF)
      - The change is to allow the code point, not the way around, so existing identifiers are guaranteed to work.
      - Apart from this CSR, this kind of behavior change is common when a Unicode upgrade is done.
      Show
      Since this is only relaxing the spec, no compatibility risk would be expected. The risk for the behavioral change in Character.isJavaIdentifierStart/Part() is also minimal for the following reasons: - The Currency Symbols range is very limited (U+20A0-U+20CF) - The change is to allow the code point, not the way around, so existing identifiers are guaranteed to work. - Apart from this CSR, this kind of behavior change is common when a Unicode upgrade is done.
    • Java API
    • SE

      Summary

      Relax the java.lang.Character specification to allow code point additions for newly defined currency symbols.

      Problem

      Currency symbols are assigned in Currency Symbols Unicode Block (U+20A0..U+20CF), but the actual defined currency symbols in that block vary over Unicode releases. For example, the BitCoin sign (U+20BF) is only available in Unicode 10 and after. This means that it is usable only in JDK 11 and after.

      Currently, Java SE specification for java.lang.Character in JDK 11u is based on the Unicode Standard version 10.0.0. As a result, new currency codes defined in Unicode versions greater than 10.0.0 , will not be available in JDK 11u.

      To help industry and end users, character property values for the code points in currency symbols block need to be relaxed so that later code point additions will not affect Java SE conformance.

      Solution

      Relax the expected character property values for the code points in currency symbols block so that later code point additions will not affect Java SE conformance. For the code points currently undefined in the range of (U+20A0..U+20CF), the return values from these methods may change in the future. Specifically,

      Character.isDefined(char/int) returns 'true'

      Character.getType(char/int) returns CURRENCY_SYMBOL

      Character.getName(char/int) returns the name of the currency

      Character.getDirectionality(char/int) returns the value other than DIRECTIONALITY_UNDEFINED

      Character.UnicodeScript.of(char/int) returns COMMON

      Specification

      Since the solution involves normative spec change, it requires a maintenance release to update the Java SE 11 specification. Below is the openJDK mail link that talks about MR for 11u release:-

      http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2018-December/000308.html

      Below are the normative spec changes in java.lang.Character class.

      Change-1:

      Add the following paragraph in java.lang.Character class description, just before the "Unicode Character Representations" section.

      Below is extract from suggested patch:

       + * <p>
       + * @implSpec The code points in {@link Character.UnicodeBlock#CURRENCY_SYMBOLS
       + * Currency Symbols} {@code UnicodeBlock} that are unassigned as of the
       + * <a href="#UnicodeVer">Unicode version noted above</a>,
       + * may be defined for currency symbols assigned by the Unicode
       + * Consortium from later updates. The definition of additionally assigned
       + * code points is implementation specific.

      Change-2:

      Modify the section that refers to the Unicode Version to an HTML anchor element, with additional explanation:

      Below is extract from suggested patch:

       - * Character information is based on the Unicode Standard, version 10.0.0. 
       + * Character information is based on <a id="UnicodeVer">the Unicode Standard,
       + * version 10.0.0</a>. Additional currency symbols (and Japanese Era Square
       + * character) defined subsequent to that Unicode version may be present.

        1. diff.html
          0.2 kB
        2. diff-stat.html
          35 kB
        3. diff-report.html
          282 kB

            dkejriwal Deepak Kejriwal (Inactive)
            naoto Naoto Sato
            Sean Coffey
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: