Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6970904

Character sequence \w in an regex pattern is narrower than defined in the specification

    XMLWordPrintable

Details

    • 1.4
    • generic
    • generic
    • Verified

    Backports

      Description

        Enclosed test case RegexTest_234 contains the valid xml document RegexTest_234.xml for the valid schema RegexTest_234.xsd.

        The specification (http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#regexs) states:
        Character sequence: Equivalent ·character class:

        \w [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}]
                                 (all characters except the set of "punctuation", "separator" and "other" characters)

        The character sequence in xml document is foo#xcab1 bar#xcab1, the regex pattern is (\w+)\s+(\w+), validation of the xml document against the schema fails with the exception:
        SAX error: file:~/devel/analysis/RegexTest_234.xml(1,129): cvc-pattern-valid: Value 'foo¿ bar¿' is not facet-valid with respect to pattern '(\w+)\s+(\w+)' for type '#AnonType_valuedoc'.

        Although the document is valid.

        Attachments

          Issue Links

            Activity

              People

                joehw Joe Wang
                lkuskov Leonid Kuskov
                Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:
                  Imported:
                  Indexed: