Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8328951

Case insensitive matching doesn't work correctly for some character classes

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Withdrawn
    • Icon: P4 P4
    • 11-pool
    • core-libs
    • None
    • behavioral
    • medium
    • While using such character classes as \p{Lower} or \p{Upper} in case-insensitive mode may seem strange, any existing regular expression that happen to use such constructs will start to behave differently.

      Summary

      Named regex character classes of forms \p{name} and \P{name} have to be made aware of the case insensitive mode.

      Problem

      In the case insensitive mode of matching against regular expression, not only a character of the input text has to be checked for inclusion into a character class, but also its lower-case, upper-case and title-case form should be checked. With the current implementation, this holds true for single characters and character classes denoted with braces, but not for the named classes of form \p{name} or \P{name}.

      In particular, this behavior goes against the POSIX standard, which states:

      9.2 Regular Expression General Requirements ... When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched.

      http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

      Solution

      The named character classes will be made aware of the case insensitive mode. In particular, when in the case insensitive mode, all range classes of form [a-z] or [A-Z] should match to the same set of characters as to the class \p{Lower} or \p{Upper}.

      Specification

      No specification changes are necessary.

            clanger Christoph Langer
            webbuggrp Webbug Group
            Matthias Baesken
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: