-
Bug
-
Resolution: Unresolved
-
P4
-
9
In JDK 9, JDK-6609854 made a significant change to the behavior of negation and nesting of character classes. This change was not documented anywhere, because some of the more obscure behaviors of character classes are not documented at all. These need to be specified.
This message from Xueming Shen (who implemented the earlier change) describes the change, some rationale, and most importantly for this bug, a some hints at what should be in the specification of these features of character classes:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-June/006957.html
The various operators whose behaviors need to be specified in combination are:
(1) Negation ^ (only at the beginning of the [...])
(2) Intersection &&
(3) Range -
(4) nested class []
(5) Union (empty string, that is, two elements placed adjacent)
Xueming's email has a statement about the precedence of these operators, but I don't think it's correct. Of course, the eventual specification should be correct.
This is mostly about specifying the existing behaviors of regex character classes, after theJDK-6609854 change. I don't expect there to be any code or behavior changes as a result of this specification update. However, some bugs might be flushed out by closer analysis, and some additional tests might be warranted. That work could be handled by separate bugs.
TheJDK-6609854 change had a retroactive CSR request filed for it: JDK-8275184. This has a bunch of details that will probably be useful in writing the specification updates.
This message from Xueming Shen (who implemented the earlier change) describes the change, some rationale, and most importantly for this bug, a some hints at what should be in the specification of these features of character classes:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-June/006957.html
The various operators whose behaviors need to be specified in combination are:
(1) Negation ^ (only at the beginning of the [...])
(2) Intersection &&
(3) Range -
(4) nested class []
(5) Union (empty string, that is, two elements placed adjacent)
Xueming's email has a statement about the precedence of these operators, but I don't think it's correct. Of course, the eventual specification should be correct.
This is mostly about specifying the existing behaviors of regex character classes, after the
The
- duplicates
-
JDK-8262279 Regex intersection character class applies to whole enclosing character class
- Closed
- relates to
-
JDK-6609854 Regex does not match correctly for negative nested character classes
- Resolved