-
CSR
-
Resolution: Withdrawn
-
P4
-
None
-
behavioral
-
medium
-
While using such character classes as \p{Lower} or \p{Upper} in case-insensitive mode may seem strange, any existing regular expression that happen to use such constructs will start to behave differently.
Summary
Named regex character classes of forms \p{name} and \P{name} have to be made aware of the case insensitive mode.
Problem
In the case insensitive mode of matching against regular expression, not only a character of the input text has to be checked for inclusion into a character class, but also its lower-case, upper-case and title-case form should be checked. With the current implementation, this holds true for single characters and character classes denoted with braces, but not for the named classes of form \p{name} or \P{name}.
In particular, this behavior goes against the POSIX standard, which states:
9.2 Regular Expression General Requirements ... When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
Solution
The named character classes will be made aware of the case insensitive mode. In particular, when in the case insensitive mode, all range classes of form [a-z] or [A-Z] should match to the same set of characters as to the class \p{Lower} or \p{Upper}.
Specification
No specification changes are necessary.
- csr of
-
JDK-8328950 Case insensitive matching doesn't work correctly for some character classes
- Closed
- relates to
-
JDK-8238984 Case insensitive matching doesn't work correctly for some character classes
- Closed