Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8143282

\p{Cn} unassigned code points should be included in \p{C}

XMLWordPrintable

    • b120
    • x86_64
    • windows_7
    • Verified

      FULL PRODUCT VERSION :


      A DESCRIPTION OF THE PROBLEM :
      According to Unicode Standard Annex 44: http://unicode.org/reports/tr44/#General_Category_Values, general category C should contain category Cc, Cf, Cs, Co, Cn.

      However, Java's implementation excludes Cn from the list.

      This bug has been there since the beginning, up to the latest commit on JDK9: http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/16fc042acee6/src/java.base/share/classes/java/util/regex/Pattern.java#l5671

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      System.out.println("\uFFFF".matches("\\p{Cn}"));
      System.out.println("\uFFFF".matches("\\pC"));

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      true
      true
      ACTUAL -
      true
      false

      REPRODUCIBILITY :
      This bug can be reproduced always.

      CUSTOMER SUBMITTED WORKAROUND :
      Explicitly specify the character classes for general category Other. e.g. `\p{Cc}\p{Cf}\p{Cn}\p{Co}\p{Cs}`

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: