Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8179668

Valid regex patterns match the latter half of complete surrogate pairs

XMLWordPrintable

      FULL PRODUCT VERSION :
      java version "1.8.0_92"
      Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      Darwin 32770 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64

      A DESCRIPTION OF THE PROBLEM :
      Regex patterns that do not contain isolated surrogate code patterns match the second half of complete surrogate pairs. Example:

      pattern: "[^\\x{10000}]"
      target: "\\ud800\\udc00"

      This pattern matches the low surrogate unit of the target pair when it should only consider the surrogate pair as a whole.

      Closely related is this bug: https://bugs.openjdk.java.net/browse/JDK-8149446. This expands on that by using a regex pattern that does not contain isolated surrogate points, which I would argue makes it higher priority.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Compile and run the source code observing the unexpected result.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Stdout:
      true
      true
      false
      ACTUAL -
      Stdout:
      true
      true
      true

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.util.regex.Pattern;
      import java.util.regex.Matcher;

      public class TestCase {
          public static void main(String[] args) {
              String text = "\ud800\udc00"; // U+010000

              // Expected behaviour
              System.out.println(Pattern.compile("\\x{10000}").matcher(text).find()); // true
              System.out.println(Pattern.compile("[\\x{10000}]").matcher(text).find()); // true

              // Unexpected behaviour
              System.out.println(Pattern.compile("[^\\x{10000}]").matcher(text).find()); // true
          }
      }
      ---------- END SOURCE ----------

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: