Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8215626

The '^' operator (negation in char classes) in regex does not work properly

    XMLWordPrintable

Details

    Description

      A DESCRIPTION OF THE PROBLEM :
      Hi, the operator '^' (negation in a character classes) seems not to work.
      I provide a source code example where his behavior is totally different in Java 8 and Java 11


      REGRESSION : Last worked in version 8u191

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Encoding : UTF-8
      The output is ooerdqK$Fop22{78ae€€
      ACTUAL -
      Encoding : UTF-8
      The output is ooerdqKFop22{78ae

      ---------- BEGIN SOURCE ----------
      import java.text.Normalizer;
      import java.util.regex.Pattern;

      /**
       *
       * @author Andres Bel Alonso
       */
      public class BugExample {

          /**
           * @param args the command line arguments
           */
          public static void main(String[] args) {
              // need UTF-8 encoding, ensure it
              System.out.println("Encoding : " + System.getProperty("file.encoding"));
              
              // I want to change this input ir order to delete the non ascii characters and non combining diacritical marks
              // but keep € and $

              String input = "ooÅ“er®†dqK$FÆ’o©pø2’2£{78a√fie€€";
              String str = Normalizer.normalize(input, Normalizer.Form.NFD);
              Pattern pattern = Pattern.compile("[^\\p{ASCII}&&[^\\p{InCombiningDiacriticalMarks}]&&[^€$]]");
              // I make me clean string
              String out = pattern.matcher(str).replaceAll("");
              
              // Java 8 ouput : ooerdqK$Fop22{78ae€€
              // Java 11 ouput : ooerdqKFop22{78ae
              // java 11 output does not complain because it cleans the characters i wanted to keep. Java 8 output is ok
              System.out.println("The output is " + out);
              
              // Finally, using the regex [\\P{ASCII}&&[\\P{InCombiningDiacriticalMarks}]&&[^€$]] works good in java 11
          }
          
      }

      ---------- END SOURCE ----------

      FREQUENCY : always


      Attachments

        Issue Links

          Activity

            People

              aleonard Andrew Leonard
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: