Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8035076

Pattern$BnMS never used due to bug in Pattern$BnM.optimize

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 9
    • 7u51
    • core-libs
    • b04
    • x86
    • windows_7
    • Verified

      FULL PRODUCT VERSION :
      java version "1.7.0_51"
      Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
      Java HotSpot(TM) Client VM (build 24.51-b03, mixed mode, sharing)

      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows [Version 6.1.7601] (Windows 7)

      A DESCRIPTION OF THE PROBLEM :
      Pattern$BnM.optimize will optimize a Pattern$Slice or Pattern$SliceS (match a fixed string) to use Boyer-Moore algorithm. However, due to a bug in the type check, only Pattern$Slice (BMP version) is optimized; Pattern$SliceS (supplementary version) is never optimized.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1) Write a pattern that matches a fixed string, which contains supplementary characters. There must be >= 4 code points.
      2) Access the internal structure of the Pattern via reflection. Access the Pattern.root field.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The first Node (which is in Pattern.root) should be an instance of Pattern$BnMS class.
      ACTUAL -
      An instance of Pattern$StartS class is found instead. Which means the search is not optimized.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      /* Author: Hong Dai Thanh/nhahtdh */

      import java.lang.reflect.Field;
      import java.util.regex.Pattern;

      class BugRegexBnMS {

          public static void dissect(Pattern pattern) throws Exception {
                
              Field rootField = Pattern.class.getDeclaredField("root");
              rootField.setAccessible(true);
              
              Object rootNode = rootField.get(pattern);
              Object currNode = rootNode;
              
              Class nodeClass = rootField.getType();
              Field nextField = nodeClass.getDeclaredField("next");
              nextField.setAccessible(true);
              
              while (currNode != null) {
                  System.out.println(currNode.getClass().getName());
                  
                  currNode = nextField.get(currNode);
              }
              
              System.out.println();
          }
          
          public static void main(String args[]) throws Exception {
              // Match a fixed string with only BMP code points.
              // The first node is an instance of Pattern$BnM as expected
              dissect(Pattern.compile("\\QNikropoeka*Syn34@ma{re}lgin'Skper?\\E"));
              
              // Match a fixed string with supplementary code points
              // EXPECTED: The first node should be an instance of Pattern$BnMS class
              // ACTUAL: The first node is an instance of Pattern$StartS class, and the next one is an instance of Pattern$SliceS class
              dissect(Pattern.compile("abf\uD83C\uDC04\uD83C\uDC05\uD83C\uDC06\uD83C\uDC04\uD83C\uDC04abfgh\uD83C\uDC06\uD83C\uDC04\uD83C\uDC04\uD83C\uDC04\uD83C\uDC04"));
          }
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      The error comes from the check in Pattern$BnM class:

      static Node optimize(Node node) {
          if (!(node instanceof Slice)) {
              return node;
          }

      Pattern$Slice and Pattern$SliceS are child class of Pattern$SliceNode. This check excludes Pattern$SliceS from being optimized.

      The simplest fix is to check whether node is instanceof SliceS also.

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: