Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8315598

Regex \b is inconsistent between UNICODE_CHARACTER_CLASS param and (?U)

XMLWordPrintable

    • b16
    • 19
    • generic
    • generic

      A DESCRIPTION OF THE PROBLEM :
      This seems to be related to the changes related to JDK-8264160. In JDK 17 and earlier the following test would pass
              var pattern= Pattern.compile("(?:\\b|\\d)"+"äst");
              assertTrue("äst".matches(pattern.pattern()));

      so a boundary match would match on the a-umlaut character. This behaviour seems to have received a breaking change with the bug earlier. Now (JDK 20) the test fails.

      However as specified in the documentation the UNICODE_CHARACTER_CLASS should use the Unicode characters. I would expect this code to pass

      var pattern= Pattern.compile("(?U)(?:\\b|\\d)"+"äst", Pattern.UNICODE_CHARACTER_CLASS);
      assertTrue("äst".matches(pattern.pattern()));

      but it fails in JDK 20 (and JDK 21 ea).

      However "The UNICODE_CHARACTER_CLASS mode can also be enabled via the embedded flag expression (?U)."

      var pattern2= Pattern.compile("(?U)(?:\\b|\\d)"+"äst");
      assertTrue("äst".matches(pattern2.pattern()));

      when using the flag, the test passes

      REGRESSION : Last worked in version 17.0.8

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      run the code below

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      this code runs without exception

      ACTUAL -
      the test fails with an IllegalStateException


      ---------- BEGIN SOURCE ----------
              var pattern= Pattern.compile("(?:\\b|\\d)"+"äst", Pattern.UNICODE_CHARACTER_CLASS);
              if(!"äst".matches(pattern.pattern()))
                  throw new IllegalStateException("äst should match pattern1");
      ---------- END SOURCE ----------

      FREQUENCY : always


            rgiulietti Raffaello Giulietti
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: