Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6970904

Character sequence \w in an regex pattern is narrower than defined in the specification

XMLWordPrintable

    • 1.4
    • generic
    • generic
    • Verified

        Enclosed test case RegexTest_234 contains the valid xml document RegexTest_234.xml for the valid schema RegexTest_234.xsd.

        The specification (http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#regexs) states:
        Character sequence: Equivalent ·character class:

        \w [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}]
                                 (all characters except the set of "punctuation", "separator" and "other" characters)

        The character sequence in xml document is foo#xcab1 bar#xcab1, the regex pattern is (\w+)\s+(\w+), validation of the xml document against the schema fails with the exception:
        SAX error: file:~/devel/analysis/RegexTest_234.xml(1,129): cvc-pattern-valid: Value 'foo¿ bar¿' is not facet-valid with respect to pattern '(\w+)\s+(\w+)' for type '#AnonType_valuedoc'.

        Although the document is valid.

              joehw Joe Wang
              lkuskov Leonid Kuskov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: