Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8360459

UNICODE_CASE and character class with non-ASCII range does not match ASCII char

XMLWordPrintable

      ADDITIONAL SYSTEM INFORMATION :
      Reproducible at least from OpenJDK 1.8.0_452 to 24.0.1.
      Tested on Ubuntu Linux, but probably generic wrt. OS.

      A DESCRIPTION OF THE PROBLEM :
      When using a `Pattern` with `CASE_INSENSITIVE | UNICODE_CASE`, if the pattern contains a character class with a *range* of non-ASCII characters, and if one of the characters in the range case-folds to an ASCII character, then the `Pattern` will *not* match ASCII letters it should match.

      For example, the character class in the pattern `"[\u017F-\u0180]"` contains `\u017F` LATIN SMALL LETTER LONG S, which case-folds to 's' (ASCII). It should therefore match a single `"s"` or `"S"`, but it does not.

      * If the character appears as a non-range in the character class, the pattern matches.
      * If the range contains the ASCII characters `s` or `S` instead, the pattern matches.
      * All alternatives do match `"\u017F"`.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Save the provided "Test Case Code" to `BugUnicodeCase.java`.
      2. Compile with `javac BugUnicodeCase.java`
      3. Run with `java -cp . BugUnicodeCase`

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      All `true` lines (21 in total).
      ACTUAL -
      ```
      true
      ...
      true // 18 so far
      false
      false
      true
      ```

      ---------- BEGIN SOURCE ----------
      ```java
      import java.util.regex.Pattern;
      import static java.util.regex.Pattern.*;

      public class BugUnicodeCase {
          public static void main(String[] args) {
              // U+017F is LATIN SMALL LETTER LONG S, which folds to 's' in CaseFolding.txt

              // Expected: true everywhere.
              // Actual: the two marked lines for p7 with "s" and "S" are false.

              Pattern p1 = Pattern.compile("s", CASE_INSENSITIVE | UNICODE_CASE);
              System.out.println(p1.matcher("s").matches());
              System.out.println(p1.matcher("S").matches());
              System.out.println(p1.matcher("\u017F").matches());

              Pattern p2 = Pattern.compile("S", CASE_INSENSITIVE | UNICODE_CASE);
              System.out.println(p2.matcher("s").matches());
              System.out.println(p2.matcher("S").matches());
              System.out.println(p2.matcher("\u017F").matches());

              Pattern p3 = Pattern.compile("\u017F", CASE_INSENSITIVE | UNICODE_CASE);
              System.out.println(p3.matcher("s").matches());
              System.out.println(p3.matcher("S").matches());
              System.out.println(p3.matcher("\u017F").matches());

              Pattern p4 = Pattern.compile("[p-u]", CASE_INSENSITIVE | UNICODE_CASE);
              System.out.println(p4.matcher("s").matches());
              System.out.println(p4.matcher("S").matches());
              System.out.println(p4.matcher("\u017F").matches());

              Pattern p5 = Pattern.compile("[P-U]", CASE_INSENSITIVE | UNICODE_CASE);
              System.out.println(p5.matcher("s").matches());
              System.out.println(p5.matcher("S").matches());
              System.out.println(p5.matcher("\u017F").matches());

              Pattern p6 = Pattern.compile("[\u017F\u0180]", CASE_INSENSITIVE | UNICODE_CASE);
              System.out.println(p6.matcher("s").matches());
              System.out.println(p6.matcher("S").matches());
              System.out.println(p6.matcher("\u017F").matches());

              Pattern p7 = Pattern.compile("[\u017F-\u0180]", CASE_INSENSITIVE | UNICODE_CASE);
              System.out.println(p7.matcher("s").matches()); // false!
              System.out.println(p7.matcher("S").matches()); // false!
              System.out.println(p7.matcher("\u017F").matches());
          }
      }
      ```
      ---------- END SOURCE ----------

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: