Name: skT45625 Date: 08/16/2000
java version "1.3.0rc2"
Java(TM) 2 Runtime Environment, Standard Endition (build 1.3.0rc2-Y)
Java Hotspot(TM) Client VM (build 1.3.0rc2-Y, mixed mode)
I've just switched to using JAXP1.0.1 after using tr1 and tr2 for the last year
or more. I've noticed a problem with the way some entities are expanded.
I have some elements that contain text which has things like < and > in
the normal text. I used to get one #TEXT node as a child with the entities
expanded (to < and >). With JAXP, I get multiple #TEXT nodes with an
EntityRefNode for each of the above entities included. That clearly is
different behavior than I'm used to.
Supporting documentation:
Here is something from the DOM2 spec...
Interface EntityReference
EntityReference objects may be inserted into the structure model when an
entity reference is in the source document, or when the user wishes to insert
an entity reference. Note that character references and references to
predefined entities are considered to be expanded by the HTML or XML
processor so that characters are represented by their Unicode equivalent
rather than by an entity reference. Moreover, the XML processor may
completely expand references to entities while building the structure model,
instead of providing EntityReference objects. If it does provide such objects,
then for a given EntityReference node, it may be that there is no Entity
node representing the referenced entity. If such an Entity exists, then the
subtree of the EntityReference node is in general a copy of the Entity node
subtree. However, this may not be true when an entity contains an unbound
namespace prefix. In such a case, because the namespace prefix resolution
depends on where the entity reference is, the descendants of the
EntityReference node may be bound to different namespace URIs.
Here is something from the XML1.0 spec...
4.6 Predefined Entities
Entity and character references can both be used to escape the left angle
bracket, ampersand, and other delimiters. A set of general entities (amp,
lt, gt, apos, quot) is specified for this purpose. Numeric character
references may also be used; they are expanded immediately when recognized
and must be treated as character data, so the numeric character references
"<" and "&" may be used to escape < and & when they occur in
character data. All XML processors must recognize these entities whether
they are declared or not. For interoperability, valid XML documents should
declare these entities, like any others, before using them. If the entities
in question are declared, they must be declared as internal entities whose
replacement text is the single character being escaped or a character reference
to that character, as shown below.
<!ENTITY lt "&#60;">
<!ENTITY gt ">">
<!ENTITY amp "&#38;">
<!ENTITY apos "'">
<!ENTITY quot """>
Note that the < and & characters in the declarations of "lt" and "amp" are
doubly escaped to meet the requirement that entity replacement be well-formed.
(Review ID: 108520)
======================================================================