-
Enhancement
-
Resolution: Fixed
-
P4
-
11, 17, 21, 22
-
b12
Spotted here:
https://twitter.com/deathy/status/1679070832801316864
See for example here:
https://github.com/openjdk/jdk/blob/aa7367f1ecc5da15591963e56e1435aa7b830f79/src/java.base/share/classes/java/util/regex/Matcher.java#L250
```
// Allocate state storage
int parentGroupCount = Math.max(parent.capturingGroupCount, 10);
groups = new int[parentGroupCount * 2];
```
There seems to be little in clamping the groups array to 10 always, as we can go and allocate just `parent.capturingGroupCount * 2`.
If we remove that clamp, then some tests would fail:
```
test RegExTest.backRefTest(): failure
java.lang.ArrayIndexOutOfBoundsException: Index 6 out of bounds for length 6
at java.base/java.util.regex.Pattern$BackRef.match(Pattern.java:5190)
at java.base/java.util.regex.Pattern$GroupTail.match(Pattern.java:5000)
at java.base/java.util.regex.Pattern$Slice.match(Pattern.java:4268)
at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4969)
at java.base/java.util.regex.Pattern$GroupTail.match(Pattern.java:5000)
at java.base/java.util.regex.Pattern$Slice.match(Pattern.java:4268)
at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4969)
at java.base/java.util.regex.Pattern$Start.match(Pattern.java:3787)
at java.base/java.util.regex.Matcher.search(Matcher.java:1736)
```
That is because Pattern.compile("abc\9") should still compile, as per Javadoc:
"In this class, \1 through \9 are always interpreted as back references, "
...but the backref would try to get the Matcher.groups by large index and then fail with AIOOB.
Remains to be seen if allocation clamp in Matcher can be removed without breaking the rest of the engine.
https://twitter.com/deathy/status/1679070832801316864
See for example here:
https://github.com/openjdk/jdk/blob/aa7367f1ecc5da15591963e56e1435aa7b830f79/src/java.base/share/classes/java/util/regex/Matcher.java#L250
```
// Allocate state storage
int parentGroupCount = Math.max(parent.capturingGroupCount, 10);
groups = new int[parentGroupCount * 2];
```
There seems to be little in clamping the groups array to 10 always, as we can go and allocate just `parent.capturingGroupCount * 2`.
If we remove that clamp, then some tests would fail:
```
test RegExTest.backRefTest(): failure
java.lang.ArrayIndexOutOfBoundsException: Index 6 out of bounds for length 6
at java.base/java.util.regex.Pattern$BackRef.match(Pattern.java:5190)
at java.base/java.util.regex.Pattern$GroupTail.match(Pattern.java:5000)
at java.base/java.util.regex.Pattern$Slice.match(Pattern.java:4268)
at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4969)
at java.base/java.util.regex.Pattern$GroupTail.match(Pattern.java:5000)
at java.base/java.util.regex.Pattern$Slice.match(Pattern.java:4268)
at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4969)
at java.base/java.util.regex.Pattern$Start.match(Pattern.java:3787)
at java.base/java.util.regex.Matcher.search(Matcher.java:1736)
```
That is because Pattern.compile("abc\9") should still compile, as per Javadoc:
"In this class, \1 through \9 are always interpreted as back references, "
...but the backref would try to get the Matcher.groups by large index and then fail with AIOOB.
Remains to be seen if allocation clamp in Matcher can be removed without breaking the rest of the engine.