Details
-
Bug
-
Resolution: Fixed
-
P3
-
6
-
None
-
b119
-
generic
-
generic
Description
> >> I have been looking into the definition of [character set]
> >> expressions in Java regular expressions, to understand what needs to
> >> be done to make ICU be compatible, or more compatible at least.
> >>
> >> There does not appear to be any formal definition for [set
> >> expressions], or at least not that I can find.
> >>
> >> Trying tests, one aspect of the behavior seems really odd. It would
> >> be good if we could find out from Sun whether it was really intended
> >> to work the way that it does.
> >>
> >> The question concerns the negation of a set,
> >> [^0-9], to get everything except for the ASCII digits, for example.
> >>
> >> In Java, the negation does _not_ apply to anything appearing in
> >> nested [brackets]
> >>
> >> So [^c] does not match "c", as you would expect.
> >> [^[c]] does match "c". Not what I would expect.
> >> [[^c]] does not match "c"
> >>
> >> The same holds true for ranges or property expressions - if they're
> >> inside brackets, a negation at an out level does not affect them.
> >>
> >> [^a-z] is opposite from [^[a-z]]
> >>
> >> And the same seems to hold for set expressions with &&, although the
> >> cases become hard to understand.
> >>
> >> Perl and Posix behavior doesn't provide any guidance here, as they do
> >> not support nested brackets at all - a '[' is not special within a
> >> set, and just becomes yet another member of the set.
> >> expressions in Java regular expressions, to understand what needs to
> >> be done to make ICU be compatible, or more compatible at least.
> >>
> >> There does not appear to be any formal definition for [set
> >> expressions], or at least not that I can find.
> >>
> >> Trying tests, one aspect of the behavior seems really odd. It would
> >> be good if we could find out from Sun whether it was really intended
> >> to work the way that it does.
> >>
> >> The question concerns the negation of a set,
> >> [^0-9], to get everything except for the ASCII digits, for example.
> >>
> >> In Java, the negation does _not_ apply to anything appearing in
> >> nested [brackets]
> >>
> >> So [^c] does not match "c", as you would expect.
> >> [^[c]] does match "c". Not what I would expect.
> >> [[^c]] does not match "c"
> >>
> >> The same holds true for ranges or property expressions - if they're
> >> inside brackets, a negation at an out level does not affect them.
> >>
> >> [^a-z] is opposite from [^[a-z]]
> >>
> >> And the same seems to hold for set expressions with &&, although the
> >> cases become hard to understand.
> >>
> >> Perl and Posix behavior doesn't provide any guidance here, as they do
> >> not support nested brackets at all - a '[' is not special within a
> >> set, and just becomes yet another member of the set.
Attachments
Issue Links
- csr for
-
JDK-8275184 change in regex character class operator precedence
- Closed
- relates to
-
JDK-8264671 Update Pattern spec to provide details of character class syntax and behavior
- Open
-
JDK-8228606 Negation on nested character classes does not work
- Closed
-
JDK-8189343 Change of behavior of java.util.regex.Pattern between JDK 8 and JDK 9
- Closed
-
JDK-8215626 The '^' operator (negation in char classes) in regex does not work properly
- Closed
-
JDK-8247728 Regex behavior is different and now wrong comparing 8 and 11 (now)
- Closed
(1 relates to)