Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8340819

Improve traditional documentation comment parsing rules

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Won't Fix
    • Icon: P3 P3
    • None
    • None
    • tools
    • None

      The latest version of the JavaDoc Documentation Comment Specification[1] specifies the parsing of traditional documentation comments as follows:

      "Traditional documentation comments are traditional comments that begin with /**. If any line in such a comment begins with asterisks after any leading whitespace, the leading whitespace and asterisks are removed. Any whitespace appearing after the asterisks is not removed."

      [1]: https://docs.oracle.com/en/java/javase/23/docs/specs/javadoc/doc-comment-spec.html

      The fact that a space character following one or more leading asterisks within a traditional doc comment line is not removed and included with the parsed comment is at odds with the vast majority of JavaDoc doc comments, which almost always include a leading asterisk followed by a single space character which is not intended to be part of the doc comment line.

      Of course this mismatch is possible because of the way whitespace is handled in HTML/CSS, where in the default inline formatting context multiple whitespace characters including line breaks are collapsed into a single space character.[2][3]

      [2]: https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Whitespace
      [3]: https://www.w3.org/TR/css-text-3/#white-space-processing

      However, it creates problems when whitespace is preserved, such as for text contained within a `<pre>` element. In this case, the space following the '*' character becomes visible. Most developers are not aware of this, which is why even in OpenJDK source a large number of code samples within `<pre>` tags include an empty trailing line containing a single space character. As an example, see the code samples in `java.lang.String`[4] (also see attached screenshots).

      [4]: https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/lang/String.html

      While fear of breaking existing documentation may have prevented JavaDoc developers from changing this aspect of doc comment parsing, there is surprisingly little cause for this.

       - Because of the way HTML handles whitespace (see links above), a removed space character at the beginning of a line (i.e. following a line break) does not affect the layout.
       - In many uses of preformatted text, a single space character will be removed from each line, which will not affect the relative layout of lines. This is true for the code samples in `java.lang.String` above and many other uses of the `<pre>` element in JDK doc comments.

      One of the very few cases where removal of a space character at the beginning of a line can be observed is with combined `<pre><code>` elements, because in this context the initial line break is preserved, leading to a change in relative indentation in the lines following the opening tags.

      I think that the compatibility risks are minor, and that we should at least consider changing the doc comment parsing rules to remove a single space following the leading asterisks.

            Unassigned Unassigned
            hannesw Hannes Wallnoefer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: