-
Bug
-
Resolution: Unresolved
-
P4
-
None
-
None
-
None
In the documentation [1] it is stated that the flag UNICODE_CHARACTER_CLASS implies UNICODE_CASE, that is, it enables Unicode-aware case folding.
Normally, when the former flag is specified (either as an argument of Pattern.compile(), or via (?U)), it automatically turns on the later.
However, it is possible to break the behavior in certain scenarios due to a bug in parsing of the embedded flag expression.
For example:
(?U-u) - turns UNICODE_CHARACTER_CLASS on without UNICODE_CASE (expected to have neither flags turned on),
(?u-U) - turns off both flags (expected to have UNICODE_CASE turned on),
(?U-U) - turns off both UNICODE_CHARACTER_CLASS and UNICODE_CASE (expected to leave UNICODE_CASE unmodified).
It seems worthwhile to do two things:
1) Clarify javadoc to specify unambiguous rules of processing the embedded flag expression,
2) Fix the parser accordingly.
[1] https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS
Normally, when the former flag is specified (either as an argument of Pattern.compile(), or via (?U)), it automatically turns on the later.
However, it is possible to break the behavior in certain scenarios due to a bug in parsing of the embedded flag expression.
For example:
(?U-u) - turns UNICODE_CHARACTER_CLASS on without UNICODE_CASE (expected to have neither flags turned on),
(?u-U) - turns off both flags (expected to have UNICODE_CASE turned on),
(?U-U) - turns off both UNICODE_CHARACTER_CLASS and UNICODE_CASE (expected to leave UNICODE_CASE unmodified).
It seems worthwhile to do two things:
1) Clarify javadoc to specify unambiguous rules of processing the embedded flag expression,
2) Fix the parser accordingly.
[1] https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS