Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8271919

Strange Regex Matching Behavior with CANON_EQ enabled.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • None
    • core-libs
    • None

      When revising some RegEx tests to use TestNG over the previous ad-hoc framework it became apparent that some tests weren't failing that should - and these were centered around some strange behavior in Unicode Canonical Equivalence matching in Java Regex.

      The `ceTest()` test in test/jdk/java/util/regex/RegExTest has been written in a way that breaks failures from being detected by the test framework. After fixing this, it appears that two test cases in this method fail (since commented out). We need to look to see if these are valid test cases and are failing for legitimate reasons. Test cases in the method are commented out with `//problem`

      These commented out cases are instances of a bug where canonical equivalence doesn't hold up where it should, or where the spec should be refined to account for canonical equivalence on invalid Unicode characters. But other issues related to canonical equivalence persist, too.

      Canonical Equivalence in the Pattern class is specified as equivalence of a string to the pattern when their decomposed normal forms (Normalizer.Form.NFD) are equivalent. If the Pattern is specified from a string that is already in its decomposed form (Example Omega \u03A9), then it will not recognize a match from another string that decomposes to it (Example Ohm \u2126).

      The spec is silent on this behavior, so this is either a shortcoming in the spec or a bug. The expectation for equivalence, absent some explicit note, is that it's a bidrectional property. What we observe with this bug is that this is not the case. Should it be bidirectional or should we update the spec?


            igraves Ian Graves
            igraves Ian Graves
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: