Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6342544

Compilation Time of java.util.regex.Pattern takes too long

XMLWordPrintable

        FULL PRODUCT VERSION :
        java version "1.5.0_01"
        Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-b08)
        Java HotSpot(TM) Client VM (build 1.5.0_01-b08, mixed mode)

        ADDITIONAL OS VERSION INFORMATION :
        Linux brave 2.6.12.4 #1 SMP Fri Aug 12 12:58:09 WST 2005 i686 i686 i386 GNU/Linux

        EXTRA RELEVANT SYSTEM CONFIGURATION :
        University Network

        A DESCRIPTION OF THE PROBLEM :
        I am working with Regular Expression(RE) using the latest java.util.regex.Pattern. Due to the many alternation group (e.g. (a|b|c|d) ), the regular expression I am constructing is usually very large with multiple alternation groups.

        The problem is that when I am compiling the large RE using Pattern.compile(patStr, Pattern.CASE_INSENSITIVE), the compiling process took hours to complete.

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        Using a FOR loop of counter X, I construct X RE alternation group with random characters.
        Each alternative group consist of 10 items. For example,
        String myGroup1 = "(aaaa|bbbb|cccc|dddd|eeee|ffff|gggg|hhhh|iiii|jjjj)";
        String myGroup 2 = ...
        For the first test,
        myGroup1 is compiled with the starttime and end time registered.
        For the second test, myGroup2 is appended to myGroup1
        myGroup1 is recompiled with the starttime and end time registered.


        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        I would expected a linear trend in the compilation time when a new alternation group is added.
        ACTUAL -
        The following is compilation time that was recorded.

        Time to Compile 1 group 3ms.
        Time to Compile 2 group 1ms.
        Time to Compile 3 group 3ms.
        Time to Compile 4 group 7ms.
        Time to Compile 5 group 10ms.
        Time to Compile 6 group 65ms.
        Time to Compile 7 group 721ms.
        Time to Compile 8 group 7090ms.
        Time to Compile 9 group 68536ms.


        REPRODUCIBILITY :
        This bug can be reproduced always.

        ---------- BEGIN SOURCE ----------
        int cnt = 1000;
                int i = 1;
                String patStr = new String("");
                for(i = 1; i < cnt; i++)
                {
                 //long initial = System.currentTimeMillis();
                 String[] words = generateWords(10);
                 patStr += buildRePortion(words);
             
                 long startCompile = System.currentTimeMillis();
                 Pattern pattern = Pattern.compile(patStr, Pattern.CASE_INSENSITIVE);
                 long finishCompile = System.currentTimeMillis();
             
                 System.out.println(patStr);
                 System.out.println("Time to Compile " + i + " group " + (finishCompile - startCompile) + "ms.\n");
                 patStr += " ";
               }
        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        No method in the java.util.regex.Pattern library allows me to reduce the compliation time.

        I had tried to use another package from org.apache.oro.text.

        The compilation is almost instant as compared to the hours it took for java.util.regex.Pattern.

              sherman Xueming Shen
              jssunw Jitender S (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: