Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6452709

spec: Pattern should explain "word boundary"

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • 6
    • core-libs

      A DESCRIPTION OF THE REQUEST :
      \b and \B's ideas of "word boundary" are quite specific, yet the javadoc is completely vague.

      JUSTIFICATION :
      a user might expect that "text\.\b" would match "text.\nhello" because to them, "word boundary" just means "whitespace or newline", a not entirely unreasonable guess as to what "word boundary" means.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      the documentation should be explicit that a word boundary is (i believe) a transition from [A-Za-z0-9_] (or a non-spacing mark with a base character) to not such a character.

      obviously, you'd want to word it a little better, and maybe explain the non-spacing mark thing.
      ACTUAL -
      the documentation just says \b and \B match "word boundaries".

      CUSTOMER SUBMITTED WORKAROUND :
      "man perlre" explains well enough, and throws in a fact about the meaning of \b in a character class:

             A word boundary ("\b") is a spot between two characters that has a "\w"
             on one side of it and a "\W" on the other side of it (in either order),
             counting the imaginary characters off the beginning and end of the
             string as matching a "\W". (Within character classes "\b" represents
             backspace rather than a word boundary, just as it normally does in any
             double-quoted string.)

            sherman Xueming Shen
            gmanwanisunw Girish Manwani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: