Loading...

XML

Word

Printable

Type: CSR
Resolution: Approved
Priority: P4
Fix Version/s: 18
Component/s: core-libs
Labels:
None

Subcomponent:
java.lang
Compatibility Risk:
minimal
Compatibility Risk Description:
This is a doc-only change.
Interface Kind:

Java API
Scope:
SE

Summary

Clarify the spec of j.l.Character#getName(int) and j.l.Character#codePointOf(String) in terms of Unicode Standard conformance.

Problem

Those methods employ JDK's own scheme to derive/parse character names for characters that do not explicitly have names in the UnicodeData.txt file. JDK's scheme deviates from the scheme defined as in Unicode Name Property section in the Unicode Standard.

Solution

Clarify the deviation explicitly in their method descriptions. The bug submitter suggests changing the scheme aligned with Unicode, but it is not possible as it would introduce a compatibility issue, where the name generated with prior JDKs cannot be used for the new codePointOf(String) method.

Specification

Change the method descriptions of j.l.Character#getName(int) and j.l.Character#codePointOf(String) as follows:

getName(int):

     /**
-     * Returns the Unicode name of the specified character
+     * Returns the name of the specified character
      * {@code codePoint}, or null if the code point is
      * {@link #UNASSIGNED unassigned}.
      * <p>
-     * Note: if the specified character is not assigned a name by
+     * If the specified character is not assigned a name by
      * the <i>UnicodeData</i> file (part of the Unicode Character
      * Database maintained by the Unicode Consortium), the returned
-     * name is the same as the result of expression:
+     * name is the same as the result of the expression:
      *
      * <blockquote>{@code
@@ -11310,13 +11310,17 @@
      *     + " "
      *     + Integer.toHexString(codePoint).toUpperCase(Locale.ROOT);
      *
      * }</blockquote>
      *
+     * For the {@code codePoint}s in the <i>UnicodeData</i> file, the name
+     * returned by this method follows the naming scheme in the
+     * "Unicode Name Property" section of the Unicode Standard. For other
+     * code points, such as Hangul/Ideographs, The name generation rule above
+     * differs from the one defined in the Unicode Standard.
+     *
      * @param  codePoint the character (Unicode code point)
      *
-     * @return the Unicode name of the specified character, or null if
+     * @return the name of the specified character, or null if
      *         the code point is unassigned.
      *
      * @throws IllegalArgumentException if the specified
      *            {@code codePoint} is not a valid Unicode
      *            code point.

codePointOf(String):

     /**
      * Returns the code point value of the Unicode character specified by
-     * the given Unicode character name.
+     * the given character name.
      * <p>
-     * Note: if a character is not assigned a name by the <i>UnicodeData</i>
+     * If a character is not assigned a name by the <i>UnicodeData</i>
      * file (part of the Unicode Character Database maintained by the Unicode
-     * Consortium), its name is defined as the result of expression:
+     * Consortium), its name is defined as the result of the expression:
      *
      * <blockquote>{@code
      *     Character.UnicodeBlock.of(codePoint).toString().replace('_', ' ')
@@ -11357,16 +11361,20 @@
      * }</blockquote>
      * <p>
      * The {@code name} matching is case insensitive, with any leading and
      * trailing whitespace character removed.
      *
-     * @param  name the Unicode character name
+     * For the code points in the <i>UnicodeData</i> file, this method
+     * recognizes the name which conforms to the name defined in the
+     * "Unicode Name Property" section in the Unicode Standard. For other
+     * code points, this method recognizes the name generated with
+     * {@link #getName(int)} method.
+     * @param  name the character name
      *
      * @return the code point value of the character specified by its name.
      *
      * @throws IllegalArgumentException if the specified {@code name}
-     *         is not a valid Unicode character name.
+     *         is not a valid character name.
      * @throws NullPointerException if {@code name} is {@code null}

csr of

JDK-8273259 Character.getName doesn't follow Unicode spec for ideographs

Resolved

Assignee:: Naoto Sato

Reporter:: Webbug Group

Reviewed By:: Brian Burkhalter, Iris Clark, Lance Andersen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2021-09-02 11:06

Updated:: 2021-09-10 14:44

Resolved:: 2021-09-10 14:44

Details

Description

Summary

Problem

Solution

Specification

Attachments

Issue Links

Activity

People

Dates