Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8354490

Pattern.CANON_EQ causes a pattern to not match a string with a UNICODE variation

XMLWordPrintable

    • b13
    • 18
    • generic
    • generic

      ADDITIONAL SYSTEM INFORMATION :
      > uname -a
      Linux 0be9c4498283 6.12.5-linuxkit #1 SMP Tue Jan 21 10:23:32 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

      > java -version
      openjdk 24 2025-03-18
      OpenJDK Runtime Environment (build 24+36-3646)
      OpenJDK 64-Bit Server VM (build 24+36-3646, mixed mode, sharing)

      A DESCRIPTION OF THE PROBLEM :
      Enabling **canonical equivalence** causes a pattern to not match a string with a **variation selector**.
      `Pattern.compile("^[^/]*\\.[^/]*$", Pattern.CANON_EQ)` does not match a string containing a variation selector (e.g., `U+FE0F`).
      While `Pattern.compile("^[^/]*\\.[^/]*$")` matches the string.

      The workaround is to remove variation selectors from the string.

      The described bug is the root cause of the following problem:
      On macOS, when a path matcher is created using the pattern `glob:*.*`,
      it gets converted to the regular expression `^[^/]*\.[^/]*$`,
      and the `Pattern.CANON_EQ` flag is passed during pattern compilation.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Compile pattern using the expression ""^[^/]*\\.[^/]*$" and the flag `Pattern.CANON_EQ`.
      2. Match a string that contains a variation selector

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The pattern matches the string
      ACTUAL -
      The pattern does not match the string

      ---------- BEGIN SOURCE ----------
      import java.util.regex.Pattern;

      public class Main {
          public static void main(String[] args) {
              // there is the variant selector (U+FE0F) after the heart emoji
              var strWithVariantSelector = "❤️ file.txt";

              var expr = "^[^/]*\\.[^/]*$";
              var pattern = Pattern.compile(expr);
              var patternWithCanonEq = Pattern.compile(expr, Pattern.CANON_EQ);

              var patternMatches = pattern.matcher(strWithVariantSelector).matches();
              var patternWithCannonMatches = patternWithCanonEq.matcher(strWithVariantSelector).matches();

              System.out.println(patternMatches); //true
              System.out.println(patternWithCannonMatches); //false
          }
      }
      ---------- END SOURCE ----------

            igraves Ian Graves
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: