Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-5039533

Character.isWhitespace(ch) doesn't properly discern breaking/non-breaking space

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P4 P4
    • None
    • 1.4.2
    • core-libs
    • x86
    • windows_xp



      Name: rmT116609 Date: 04/28/2004


      FULL PRODUCT VERSION :
      java version "1.4.2"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
      Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)

      java version "1.5.0-beta"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b32c)
      Java HotSpot(TM) Client VM (build 1.5.0-beta-b32c, mixed mode)

      FULL OS VERSION :
      Microsoft Windows Version 5.1 (Build 2600.xpsp2.030422-1633 : Service Pack 1)

      A DESCRIPTION OF THE PROBLEM :
      According to the Unicode standard, Unicode has four non-breaking spaces:

      U+00A0 NO-BREAK SPACE
      U+202F NARROW NO-BREAK SPACE
      U+2060 WORD JOINER
      U+FEFF ZERO WIDTH NO-BREAK SPACE (Deprecated since Unicode 3.2; use U+2060 instead.)

      Unicode may also have a fifth non-breaking space in Mongolian:

      U+180E MONGOLIAN VOWEL SEPARATOR

      Outside of its Mongolian context, U+180E is either ignored or treated as being the same as U+202F, depending upon the implementation.

      The static function java.lang.Character.isWhitespace(char ch) correctly recognizes U+00A0, U+180E, U+202F, U+2060, U+FEFF as non-breaking spaces and excludes them from being considered whitespaces according to the documentation and specifications.

      However, the function incorrectly recognizes U+2007 FIGURE SPACE and U+205F MEDIUM MATHEMATICAL SPACE as non-breaking spaces and erroneously excludes them from being considered whitespace.

      Regardless of whether you consider the use and function of U+2007 and U+205F as a breaking space or non-breaking space, the Unicode standard defines them as breaking spaces, and so they should be considered as such for the purposes of the java.lang.Character.isWhitespace(char ch) function. Breaking from the standard can lead to programmer confusion and even logic errors in a program.

      Of course, the documentation would have to be updated to reflect these changes.


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Simply output the results of the java.lang.Character.isWhitespace(char ch) function to the system console. See "Source code for an executable test case"


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      True if whitespace; false otherwise.
      U+2007: true
      U+205F: true

      ACTUAL -
      True if whitespace; false otherwise.
      U+2007: false
      U+205F: false

      With 1.5.0 Beta 1:

      java Test
      True if whitespace; false otherwise.
      U+2007: false
      U+205F: true


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      class Test {

      public static void main(String args[]) {


      System.out.println("True if whitespace; false otherwise.");
      System.out.println("U+2007: " +
          Boolean.toString(Character.isWhitespace('\u2007')));
      System.out.println("U+205F: " +
          Boolean.toString(Character.isWhitespace('\u205F')));

      }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Check each character individually (U+2007 and U+205F) after performing the java.lang.Character.isWhitespace(char ch) function.
      (Incident Review ID: 198964)
      ======================================================================

            nlindenbsunw Norbert Lindenberg (Inactive)
            rmandalasunw Ranjith Mandala (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: