Summary
Add API and Impl notes to the package description of org.w3c.dom to clarify the javadoc for get and set methods, and the discrepancy between the specification and implementation for org.w3c.dom.ls.LSSerializer.
Problem
There are two issues to be covered in this change.
The first is about the Java documents for get and set methods of properties. The format and styles for these methods in the org.w3c.dom package do not follow the javadoc standard. Instead of describing the actions these methods will perform, they were written as a definition of the field or attribute the get and set methods will operate on and covered both get and set actions within one text. They were a copy of each other that may cause confusion as if one was copied mistakenly and therefore missing.
The second issue is about the specification for org.w3c.dom.ls.LSSerializer. Within the specification, there was a requirement that a LSSerializer would output characters or character references based on the output encoding. This requirement contradicts with the XML specification where characters had a range with no association to the output encoding.
Since DOM L3 specification and the org.w3c.dom package have not been actively maintained in 16 years, it is unlikely it will be updated in the future. Within the Java SE specification therefore, a general clarification is necessary to document the issues.
Solution
Add the followings to the org.w3c.dom package description.
— An API Note explaining the structure of the existing Javadoc.
— An Impl Note explaining the deviation between the LSSerializer specification and JDK implementation.
Specification
Add the following to the org.w3c.com package description:
API Note:
The documentation comments for the get and set methods within this API are written as property definitions and
are shared between both methods. These methods do not follow the standard Java SE specification format.
Take the Node TextContent property as an example, both getTextContent and setTextContent shared the same
content that defined the TextContent property itself.
Implementation Note:
The JDK implementation of LSSerializer follows the Characters section of the XML Specification in handling
characters output. In particular, the specification defined a character range that excluded the surrogate blocks.
As a result, the JDK LSSerializer writes characters in the surrogate blocks as Character References.
Character 0xf0 0x9f 0x9a 0xa9 (Unicode code point U+1F6A9) for example will be written as 🚩.
This behavior is different from what is defined in the class description of LSSerializer. The relevant section is quoted below:
Within the character data of a document (outside of markup), any characters that cannot be represented directly
are replaced with character references... Any characters that cannot be represented directly in the output character
encoding are serialized as numeric character references
The JDK implementation does not follow this definition because it is not consistent with the XML Specification
that defined an explicit character range with no association to the setting of the output character encoding.
Attache specdiffs. Convenient specdiff and webrevs can be viewed at:
http://cr.openjdk.java.net/~joehw/jdk16/8249643/specdiff_02/org/w3c/dom/package-summary.html
- csr of
-
JDK-8249643 Clarify DOM documentation
-
- Resolved
-
- relates to
-
JDK-8252984 Remove the implNote in the DOM package description added by JDK-8249643
-
- Closed
-