Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8214245

Case insensitive matching doesn't work correctly for some character classes

XMLWordPrintable

        ADDITIONAL SYSTEM INFORMATION :
        $ java -version
        openjdk version "11.0.1" 2018-10-16
        OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
        OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)

        A DESCRIPTION OF THE PROBLEM :
        When using the CASE_INSENSITIVE flag, the matching behavior of the POSIX character classes and a literal character class with the same set differs.

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        See test program.

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        The pattern "[a-z]" should behave the same as "\\p{Lower}" which in the docs it says is US-ASCII only and the same as "[a-z]".
        ACTUAL -
        When running with the CASE_INSENSITIVE flag, "[a-z]" will match an uppercase letter, but "\\p{Lower}" will not.

        ---------- BEGIN SOURCE ----------
        // $ javac Test.java
        // $ java -ea Test
        // Exception in thread "main" java.lang.AssertionError
        // at Test.main(Test.java:8)
        import java.util.regex.Pattern;

        public class Test {
          public static void main(String[] args) {
            Pattern p1 = Pattern.compile("[a-z]", Pattern.CASE_INSENSITIVE);
            Pattern p2 = Pattern.compile("\\p{Lower}", Pattern.CASE_INSENSITIVE);
            assert(p1.matcher("A").find() == p2.matcher("A").find());
          }
        }
        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        Avoid using POSIX character classes.

        FREQUENCY : always


              igerasim Ivan Gerasimov
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: