Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8141066

Regular expression parsing loop

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P4 P4
    • None
    • 8u60, 9
    • core-libs

      FULL PRODUCT VERSION :
      Java 7 on Windows:
      java version "1.7.0_80"
      Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
      Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

      Java 8 on Windows:
      java version "1.8.0_60-ea"
      Java(TM) SE Runtime Environment (build 1.8.0_60-ea-b25)
      Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

      On Ubuntu:
      java version "1.7.0_67"
      Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
      Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows [Versione 6.1.7601]

      Linux innovation-2 3.11.0-26-generic #45~precise1-Ubuntu SMP Tue Jul 15 04:02:35 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

      A DESCRIPTION OF THE PROBLEM :
      We found a set of strings that, when parsed by a regular expression, will lock the execution in a loop. In details, the problem arises when the string to be parsed contains both repeated words and a '?' character.
      Removing one of this conditions (no repeated text or no '?' character) makes the code run correctly.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Just execute the following code:

      public static void main(String[] args) {
          String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
          String EMAIL_CONTACT_PATTERN = "(\\s*\"?\\s*([_A-Za-z0-9- ]+)*?\\s*\"?\\s*)?" + "<?(" + EMAIL_PATTERN + ")>?";
          Pattern contactPattern = Pattern.compile(EMAIL_CONTACT_PATTERN);
          Matcher matcher = contactPattern.matcher("RenNome asd renCognome asd <bas?@alice.it>");
          if (matcher.matches()) {
              System.out.println("Matches");
          }
          else {
              System.out.println("Doesn't match");
          }
      }

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Console should print either "Matches" or "Doesn't match"
      ACTUAL -
      The execution will stuck on the matcher.matches() line

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.util.regex.Matcher;
      import java.util.regex.Pattern;

      public class RegExpBug {

          public static void main(String[] args) {
              String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
              String EMAIL_CONTACT_PATTERN = "(\\s*\"?\\s*([_A-Za-z0-9- ]+)*?\\s*\"?\\s*)?" + "<?(" + EMAIL_PATTERN + ")>?";

              Pattern contactPattern = Pattern.compile(EMAIL_CONTACT_PATTERN);
              Matcher matcher = contactPattern.matcher("RenNome asd renCognome asd <bas?@alice.it>");
              if (matcher.matches()) {
                  System.out.println("Matches");
              }
              else {
                  System.out.println("Doesn't match");
              }
          }

      }
      ---------- END SOURCE ----------

            igerasim Ivan Gerasimov
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: