Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8278742

Erroneous capture possible if string ends with new-line character

XMLWordPrintable

      ADDITIONAL SYSTEM INFORMATION :
      MacOS Mojave 10.14.6
      OpenJDK 1.8.0_192-b12
      OpenJDK 9+181
      OpenJDK 10.0.2+13
      OpenJDK 11.0.2+9
      OpenJDK 12.0.2+10
      OpenJDK 13+33
      OpenJDK 14+36-1461
      OpenJDK 15.0.2+7-27
      OpenJDK 16+36-2231
      OpenJDK 17+35-2724
      OpenJDK 18-ea+11-557

      A DESCRIPTION OF THE PROBLEM :
      A new-line character placed at the end of the string causes regex incorrect regex capture group processing under certain scenarios.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Run the included source code

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The last capture group should always be empty.

      The pattern's first capture group performs a lazy unlimited match for any character followed by end of string anchor, this should always capture the entire string.

      The pattern's second capture group performs an aggressive unlimited match for any character between two end of string anchors, this should always capture nothing.

      ACTUAL -
      When the last character is a new-line character, the last character will be captured in the second capture group - which should be impossible.

      ---------- BEGIN SOURCE ----------
      import java.util.regex.Matcher;
      import java.util.regex.Pattern;

      /**
       * RegEx Bug when the last character is a new-line
       */
      public class RegexTest {
      public static void main(String...args){
      System.out.println("Last capture group should always be empty...\n");

      testRun(Pattern.compile("^([\\s\\S]*?)$([\\s\\S]*)$"));
      testRun(Pattern.compile("^(.*?)$(.*)$", Pattern.DOTALL));

      System.out.println("Result: if the last character is a new-line character it is erroneously captured in the second group");
      System.out.println("Java version: " + System.getProperty("java.runtime.version"));
      }

      static void testRun(Pattern p){
      System.out.println("RegEx Pattern = \"" + p + '"');
      test(p, "\n"); // fail when last character is newline
      test(p, "a");
      test(p, "aa");
      test(p, "\na");
      test(p, "\n\n"); // fail when last character is newline
      test(p, "\n\n\n"); // fail when last character is newline
      test(p, "\n\n\n ");
      System.out.println();
      }

      static void test(Pattern p, String input){
      Matcher m = p.matcher(input);

      String replacement = m
      .replaceAll("[$1][$2]") // suround capture groups in brackets using regex substitution
      .replace('\n', '↵'); // replace newline with visible character for easier reading

      System.out.print(replacement);

      m.matches();
      System.out.print(0 < m.group(2).length() ? "\t◀ FAILED" : "");
      System.out.println();
      }
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      None found

      FREQUENCY : always


            igraves Ian Graves
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: