Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4404977

Incomplete or inaccurate Character spec

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P3 P3
    • 1.4.0
    • 1.4.0
    • core-libs
    • beta
    • generic, x86
    • generic, windows_nt
    • Verified

      Tracking bug for the following spec bugs --

      1. (4402549) The jdk1.4.0beta-b45 API specification for method Character.isMirrored(ch) doesn't specify the return value for undefined chars.
      Unicode3.0 standard doesn't specify mirrored property for undefined chars.
      jdk1.4.0beta-b45 API implementation returns value false for all undefined chars.

      2. (4402548) The jdk1.4.0beta-b45 API specification for method Character.getDirectionality(ch) doesn't specify the return value for undefined chars. According to Unicode standard the directional type for all unassigned code values is not defined.

      According to Unicode standard the directional type for all unassigned
      code values is not defined but jdk1.4.0beta-b45 API implementation returns value
      DIRECTIONALITY_LEFT_TO_RIGHT (i.e. 0) for all undefined chars.
      The following simple test shows this:

      public class test {
          public static void main(String[] args){
              String str = "";
              for (int i = 0; i <= 65535; ++i){
                  if (Character.isDefined((char)i) != true){
                      str += " " + Character.getDirectionality((char)i);
                  }
              }
              System.out.println(str);
          }
      }

      3. (4402127) Character.getNumericValue(ch) method returns incorrect values for the following
      chars:
      0x41 0x42 0x43 0x44 0x45 0x46 0x47 0x48 0x49 0x4a 0x4b 0x4c 0x4d 0x4e 0x4f 0x50
      0x51 0x52 0x53 0x54 0x55 0x56 0x57 0x58 0x59 0x5a 0x61 0x62 0x63 0x64 0x65 0x66
      0x67 0x68 0x69 0x6a 0x6b 0x6c 0x6d 0x6e 0x6f 0x70 0x71 0x72 0x73 0x74 0x75 0x76
      0x77 0x78 0x79 0x7a 0xff21 0xff22 0xff23 0xff24 0xff25 0xff26 0xff27 0xff28
      0xff29 0xff2a 0xff2b 0xff2c 0xff2d 0xff2e 0xff2f 0xff30 0xff31 0xff32 0xff33
      0xff34 0xff35 0xff36 0xff37 0xff38 0xff39 0xff3a 0xff41 0xff42 0xff43 0xff44
      0xff45 0xff46 0xff47 0xff48 0xff49 0xff4a 0xff4b 0xff4c 0xff4d 0xff4e 0xff4f
      0xff50 0xff51 0xff52 0xff53 0xff54 0xff55 0xff56 0xff57 0xff58 0xff59 0xff5a

      jdk1.4.0beta-b45 specification reads:
      " public static int getNumericValue(char ch)
        Returns the int value that the specified Unicode character represents.
        For example, the character '\u216C' (the roman numeral fifty) will return
        an int with a value of 50.
        If the character does not have a numeric value, then -1 is returned.
        If the character has a numeric value that cannot be represented as a nonnegative
        integer (for example, a fractional value), then -2 is returned."

      According to this, Character.getNumericValue(ch) should return -1 for the specified
      above chars, since Unicode3.0 defines no numeric values for these chars.
      However, jdk1.4.0beta-b45 API implementation returns the following corresponding
      decimal values:
      10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 10
      11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 10 11
      12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 10 11 12
      13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
      Due to this new JCK Merlin test api/java_lang/Character/index.html#charFullRange[Character2084]
      fails.

      4. (4401684) The jdk1.4.0beta-b45 specification for method Character.isUnicodeIdentifierStart(ch) reads:
      "public static boolean isUnicodeIdentifierStart(char ch)
      Determines if the specified character is permissible as the first character
      in a Unicode identifier. A character may start a Unicode identifier if and
      only if it is a letter..."

      However, jdk1.4beta-b45, jdk1.3, jdk1.2.2 API implementations consider not only
      letters to be a Unicode identifier start, but also characters whose Unicode general type
      is "Nl" which are not letters (according to the Character.isLetter(ch) method specification).
      This concerns Character.isJavaIdentifierStart(ch) method as well.

      The following simple test shows this:

      public class test {
          public static void main(String[] args){
              for (int i = 0; i <= 65535; ++i){
                  if (Character.isLetter((char) i) != Character.isUnicodeIdentifierStart((char) i)){
                      System.out.print("0x" + Integer.toHexString(i));
                  }
              }
          }
      }

      5. (4401683) jdk1.4.0beta-b45 API specification for the method Character.toTitleCase(ch)
      is inaccurate. It states:
      "public static char toTitleCase(char ch)
      Converts the character argument to titlecase using case mapping information from
      the UnicodeData file. If a character has no explicit titlecase mapping according to
      UnicodeData, then the uppercase mapping is returned as an equivalent titlecase mapping."

      This is incorrect algorithm for those chars of Unicode category "Lt" which are
      titlecase chars themselves but also have uppercase mappings that differ
      from the char's codepoints.
      For example, for the following chars Unicode3.0.0 defines:

      CODEPOINT UPPER_CASE LOWER_CASE TITLE_CASE CATEGORY
       0x01C5 0x01C4 0x01C6 no "Lt"
       0x01C8 0x01C7 0x01C9 no "Lt"
       0x01CB 0x01CA 0x01CC no "Lt"
       0x01F2 0x01F1 0x01F3 no "Lt"

      Following the specified algorithm Character.toTitleCase((char)0x01C5) should return
      0x01C4, but in fact jdk1.4.0beta-b45 Character.toTitleCase((char)0x01C5) returns
      correct 0x01C5.

      6. (4395328) Character api doc(isWhitespace() method description) in build 1.4.0beta-b43 api doc has two wrong information.
      ", but is not a no-break space (\u00A0 or \uFEFF)"

      . \uFEFF is not one of SPACE_SEPARATOR, LINE_SEPARATOR, and PARAGRAPH_SEPARATOR type.
      . (\u00A0 or \uFEFF) should be (\u00A0, \u202F, or \u2007)

      Please see 4395323 for 202f and 2007.
      New non-breaking space (and other non-breaking) chars have been added to Unicode 3.0 spec. The following separators should be excluded from the set; they should return false:
      00a0
      2007
      202f


      All of these bugids indicate that the Character javadoc needs updating in their specific areas. In most cases, the spec needs to document intended but undocumented behavior.

            joconnersunw John Oconner (Inactive)
            joconnersunw John Oconner (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: