Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8309515

Stale cached data from Matcher.namedGroups() after Matcher.usePattern()

XMLWordPrintable

    • b18
    • 20
    • b26
    • generic
    • generic
    • Verified

      ADDITIONAL SYSTEM INFORMATION :
      OS independent; bug can be seen in java.util.regex.Matcher source code, introduced in commit openjdk/jdk@ce85cac for issue 8065554.

      A DESCRIPTION OF THE PROBLEM :
      In addressing issue JDK-8065554 (MatchResult should provide values of named-capturing groups), commit openjdk/jdk@ce85cac added a namedGroups field in Matcher to cache the map from parentPattern.namedGroups().

      The map is lazily cached, only when the field is null and namedGroups() is called (which may be indirectly through a call of start(String), end(String), or group(String). The same cached value will then continue to be returned even if Matcher.usePattern is later called and the new Pattern has different named groups, or no named groups, or the same named groups mapped to different integers. Therefore, symptoms can include seeing the wrong results when retrieving by named groups, or spurious IllegalArgumentExceptions for groups that the new pattern provides, or exceptions not thrown for groups that the new pattern doesn't provide, or exceptions for an invalid group index when calling a method that takes a group name.

      Could be fixed by eliminating the local copy and simply having Matcher.namedGroups() call parentPattern.namedGroups() unconditionally, or by having Matcher.usePattern simply null the field, so the correct map will be lazily cached when next needed.


      REGRESSION : Last worked in version 17

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      jshell in version 20:

      var p1 = Pattern.compile("(?<a>...)(?<b>...)");
      var p2 = Pattern.compile("(?<b>...)(?<a>...)");
      var m = p1.matcher("foobar");
      m.matches()
      // ==> true
      m.group("a")
      // ==> "foo"
      m.usePattern(p2)
      m.matches()
      // ==> true
      m.group("a")
      // ==> "foo" WRONG RESULT: should be "bar" for p2

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Using p1, (?<a>...)(?<b>...), on the string "foobar" should return "foo" for group a.
      Using p2, (?<b>...)(?<a>...), on the string "foobar" should return "bar" for group a.
      ACTUAL -
      After usePattern(p2) on the string "foobar", "foo" is incorrectly returned for group a, because the group name to group index mapping for pattern p1 is still cached.

      CUSTOMER SUBMITTED WORKAROUND :
      Calls to m.namedGroups() on a Matcher m can be replaced with calls to m.pattern().namedGroups(), and any calls to m.start(String), m.end(String), or m.group(String) can be replaced with code that maps the name string to an integer index using m.pattern().namedGroups(), throwing the proper exception if not found, and then calls start(int), end(int), or group(int).

      In code that must also compile on Java < 20 where the namedGroups() method is unknown, a MethodHandle can be constructed for it at runtime, or variant code can be supplied in a multi-version jar.

            rgiulietti Raffaello Giulietti
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: