Details
-
Sub-task
-
Resolution: Delivered
-
P4
-
15
-
Verified
Description
The change JDK-8235812 in Java 15 introduced incorrect behavior for matching of the `\R` Unicode linebreak sequence when using the `java.util.regex.Pattern` API. The `\R` sequence should match CR (U+000D) or LF (U+000A) individually, but it should not match an individual CR if it occurs in a CRLF sequence. An example of the erroneous behavior is that the pattern `\R{2}` matches a CRLF sequence, but it should not. A possible workaround is to match linebreaks using individual characters instead of `\R`, using negative lookahead to prevent matching of an individual CR within a CRLF sequence. To do this, replace the `\R` sequence with the following:
```
(?:(\u000D\u000A)|((?!\u000D\u000A)[\000A\u000B\u000C\u000D\u0085\u2028\u2029]))
```
A simpler sequence can be used if matching all of the Unicode-specified linebreak characters is not required, or if special treatment for the CRLF sequence is not required.
```
(?:(\u000D\u000A)|((?!\u000D\u000A)[\000A\u000B\u000C\u000D\u0085\u2028\u2029]))
```
A simpler sequence can be used if matching all of the Unicode-specified linebreak characters is not required, or if special treatment for the CRLF sequence is not required.