-
Enhancement
-
Resolution: Unresolved
-
P4
-
None
-
6
-
x86
-
linux
A DESCRIPTION OF THE REQUEST :
\b and \B's ideas of "word boundary" are quite specific, yet the javadoc is completely vague.
JUSTIFICATION :
a user might expect that "text\.\b" would match "text.\nhello" because to them, "word boundary" just means "whitespace or newline", a not entirely unreasonable guess as to what "word boundary" means.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
the documentation should be explicit that a word boundary is (i believe) a transition from [A-Za-z0-9_] (or a non-spacing mark with a base character) to not such a character.
obviously, you'd want to word it a little better, and maybe explain the non-spacing mark thing.
ACTUAL -
the documentation just says \b and \B match "word boundaries".
CUSTOMER SUBMITTED WORKAROUND :
"man perlre" explains well enough, and throws in a fact about the meaning of \b in a character class:
A word boundary ("\b") is a spot between two characters that has a "\w"
on one side of it and a "\W" on the other side of it (in either order),
counting the imaginary characters off the beginning and end of the
string as matching a "\W". (Within character classes "\b" represents
backspace rather than a word boundary, just as it normally does in any
double-quoted string.)
\b and \B's ideas of "word boundary" are quite specific, yet the javadoc is completely vague.
JUSTIFICATION :
a user might expect that "text\.\b" would match "text.\nhello" because to them, "word boundary" just means "whitespace or newline", a not entirely unreasonable guess as to what "word boundary" means.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
the documentation should be explicit that a word boundary is (i believe) a transition from [A-Za-z0-9_] (or a non-spacing mark with a base character) to not such a character.
obviously, you'd want to word it a little better, and maybe explain the non-spacing mark thing.
ACTUAL -
the documentation just says \b and \B match "word boundaries".
CUSTOMER SUBMITTED WORKAROUND :
"man perlre" explains well enough, and throws in a fact about the meaning of \b in a character class:
A word boundary ("\b") is a spot between two characters that has a "\w"
on one side of it and a "\W" on the other side of it (in either order),
counting the imaginary characters off the beginning and end of the
string as matching a "\W". (Within character classes "\b" represents
backspace rather than a word boundary, just as it normally does in any
double-quoted string.)
- duplicates
-
JDK-8043727 Behavior of regex \b (word boundary) is unclear; Description of \B is wrong
-
- Closed
-