Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8225021

Treat ambiguous embedded flags as parse syntax errors

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • None
    • core-libs
    • None

      In the documentation [1] it is stated that the flag UNICODE_CHARACTER_CLASS implies UNICODE_CASE, that is, it enables Unicode-aware case folding.

      Normally, when the former flag is specified (either as an argument of Pattern.compile(), or via (?U)), it automatically turns on the later.

      However, it is possible to break the behavior in certain scenarios due to a bug in parsing of the embedded flag expression.

      For example:
      (?U-u) - turns UNICODE_CHARACTER_CLASS on without UNICODE_CASE (expected to have neither flags turned on),
      (?u-U) - turns off both flags (expected to have UNICODE_CASE turned on),
      (?U-U) - turns off both UNICODE_CHARACTER_CLASS and UNICODE_CASE (expected to leave UNICODE_CASE unmodified).

      It seems worthwhile to do two things:
      1) Clarify javadoc to specify unambiguous rules of processing the embedded flag expression,
      2) Fix the parser accordingly.

      [1] https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS

            Unassigned Unassigned
            igerasim Ivan Gerasimov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: