Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6971190

Xml document validator partly accepts UTF lexical presentation of digit and words

XMLWordPrintable

    • 1.4
    • generic
    • generic
    • Verified

        The validator uses the following schema:

        <xsd:element name="doc">
            <xsd:complexType>
                    <xsd:choice>
                        <xsd:element name="elem" type="Regex" minOccurs="1" maxOccurs="unbounded"/>
                    </xsd:choice>
            </xsd:complexType>
        </xsd:element>

        <xsd:complexType name="Regex">
           <xsd:attribute name="att">
               <xsd:simpleType>
                   <xsd:restriction base="xsd:string">
                       <xsd:pattern value="\d"/>
                   </xsd:restriction>
               </xsd:simpleType>
           </xsd:attribute>
        </xsd:complexType>

        and the xml document:

        <doc xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance&#39;
            xsi:noNamespaceSchemaLocation='reS17.xsd' >

        <!--
        base='string', pattern='\d', value='#x1369;', type='valid', RULE='37'
        -->

              <elem att='&#x1369;'/>

        </doc>

        If value of the att attribute has a lexical presentation of digit starting with 0, the validator accept such xml document as valid.
        There is a set of tested UTF symbols which validator accepts
        U+0C66 TELUGU DIGIT ZERO
        U+0CE6 KANNADA DIGIT ZERO
        U+0D66 MALAYALAM DIGIT ZERO
        U+0E50 THAI DIGIT ZERO
        U+0ED0 LAO DIGIT ZERO
        U+0F20 TIBETAN DIGIT ZERO

        If the document has following digit presentation:
        U+1040 MYANMAR DIGIT ZERO
        U+1369 ETHIOPIC DIGIT ONE
        U+17E0 KHMER DIGIT ZERO
        U+1810 MONGOLIAN DIGIT ZERO
        U+FF10 FULLWIDTH DIGIT ZERO
        U+1049 MYANMAR DIGIT NINE
        U+1371 ETHIOPIC DIGIT NINE
        U+17E9 KHMER DIGIT NINE

        The validator fails with exception:
        SAX error: file:/devel/analysis/reS17.xml(9,29): cvc-pattern-valid: Value '¿' is not facet-valid with respect to pattern '\d' for type '#AnonType_attRegex'.

              joehw Joe Wang
              lkuskov Leonid Kuskov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: