Proposal: #ModuleNameCharacters (revised)
mark.reinhold at oracle.com
mark.reinhold at oracle.com
Fri Dec 9 21:45:46 UTC 2016
Issue summary
-------------
#ModuleNameCharacters --- Module names are presently constrained to
be Java identifiers. Some existing module systems allow additional
characters in module names, such as hyphens and slashes. Should this
restriction be lifted or, perhaps, should it somehow be made
layer-specific? [1]
Proposal
--------
Do not change the treatment of module names in source code; they will
remain qualified names. Revise the encoding of module names in compiled
module-declaration class files to lift the current constraints but adopt
new, less onerous constraints that still provide for the future evolution
of the platform. Revise the format of class files to structure module
and package names in a manner consistent with that already used for other
kinds of constrained names.
* * *
Modules are a new construct of the Java programming language in the
present design. In the source language they are hence identified by
qualified names [2] in the same manner as the existing structural
constructs, i.e., packages and classes. As such these names do allow
some unusual characters, though not hyphens or slashes [3].
In the very long term a future version of the language may well support
not just the declaration of modules, and of relationships between them,
but also the expression of operations upon them as is possible in, e.g.,
Standard ML [4], or qualified references in code to a type in some other
named module, or yet some other kind of use that we do not imagine today.
It would hence be unwise at this point to allow module names in source
code to be any different in nature than the other kinds of qualified
names already in the language.
We will therefore retain the present constraints on module names in the
source language and also continue to enforce those constraints in the
`ModuleDescriptor.Builder` API, which is intended to be consistent with
the language. (The `ModuleDescriptor` API will continue to be able to
read class files that contain module names not expressible in the source
language.)
* * *
Module names in compiled module-declaration class files are presently
encoded in the internal form traditionally used for qualified names:
Periods (`.`) are replaced with forward slashes (`/`), and periods,
semicolons (`;`), and left square brackets (`[`) are forbidden [5].
This encoding is inconvenient for other module systems that may
interoperate with JPMS, so we will abandon it for module names despite
the fact that doing so will increase the complexity of any code that
parses class files.
To allow for the future evolution of the platform we propose a different,
less onerous encoding of module names in class files:
- If at some future point we find that we need to add structure to
module names, or combine module names with qualified type names,
then the `:` character would be a good candidate, even in the
source language if need be, so we reserve that character now.
- We presently use `@` in the API to separate module names from
version strings, where available, so it is prudent to reserve
that character in module names in class files also, just in case
we someday decide to introduce compound module identifiers that
combine module names with version strings.
- In further support of interoperation we will reserve the universal
escape character (`\`) and define the sequences `\\`, `\:`, and
`\@` to stand for `\`, `:`, and `@`, respectively.
- We will finally, for sanity, forbid any character whose Unicode code
point is less than 0x20 (` `). (Ideally we'd forbid all Unicode
non-printing characters, but it's best not to have the JVMS depend
too deeply upon details of the Unicode specification.)
To sum up: In module names in class files reserve `:` and `@` for future
use; reserve `\` as an escape character and use it to quote itself, `:`,
and `@`; and forbid the non-printing ASCII characters (< 0x20).
* * *
The first version of this proposal [6] claimed that the present design is
consistent with the existing treatment of qualified names in class files.
That is, in fact, not true, since qualified names in class files today
are always wrapped in tagged constant-pool structures rather than simple
`CONSTANT_Utf8_info` structures. Class names, e.g., are wrapped in
`CONSTANT_Class_info` structures, which in turn reference the `Utf8`
structures that represent the actual class names [7].
To address this inconsistency, and particularly in light of the new
encoding of module names described above, we propose to use consistent
kinds of class-file structures for module and package names.
Module names in a compiled module-declaration class file will be encoded
as above and wrapped in tagged `CONSTANT_Module_info` structures:
CONSTANT_Module_info {
u1 tag; // == CONSTANT_Module == 19
u2 name_index; // Index of a CONSTANT_Utf8_info
}
Package names in class files will be encoded in the traditional internal
form and wrapped in tagged `CONSTANT_Package_info` structures:
CONSTANT_Package_info {
u1 tag; // == CONSTANT_Package == 20
u2 name_index; // Index of a CONSTANT_Utf8_info
}
Existing references in the class-file format to module and package names
will be adjusted to refer to these new kinds of tagged structures.
[1] http://openjdk.java.net/projects/jigsaw/spec/issues/#ModuleNameCharacters
[2] http://docs.oracle.com/javase/specs/jls/se8/html/jls-6.html#jls-6.2
[3] http://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8
[4] https://en.wikipedia.org/wiki/Standard_ML#Module_system
[5] http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.2.1
[6] http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2016-November/000468.html
[7] http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.4.1
More information about the jpms-spec-observers
mailing list