-
Enhancement
-
Resolution: Unresolved
-
P3
-
None
-
Fix Understood
Javac uses a class called Context for creating its components. You can think of a context as a map from component keys (Context.Key) to either component instances (the most common case) or component factories (see below); the use of the Context class serves two main purposes:
(1) it allows sharing of component instances across the whole compiler lifetime span
(2) it allows other tools (i.e. javadoc) and clients (i.e. Netbeans) to 'override' certain components with more specific ones
Use case (1) is taken care by the fact that all component classes have a method like this:
public static Enter instance(Context context) {
Enter instance = context.get(enterKey);
if (instance == null)
instance = new Enter(context);
return instance;
}
So, if a client calls:
Enter.instance(context)
Either an Enter instance is already available in the context, in which case it is immediately returned, or it is not available and it has to be created. Typically, a component class will depend on several other component classes, so it is very typical for a constructor of a component class to call one or more 'instance' methods in other components. This can lead to problems (see below).
Use case (2) is taken care by subclassing + context factories; as we said above, a context can also map a component key to a component factory; for instance, JavadocTool does the following:
public static JavadocTool make0(Context context) {
Messager messager = null;
try {
// force the use of Javadoc's class finder
JavadocClassFinder.preRegister(context);
// force the use of Javadoc's own enter phase
JavadocEnter.preRegister(context);
// force the use of Javadoc's own member enter phase
JavadocMemberEnter.preRegister(context);
// force the use of Javadoc's own todo phase
JavadocTodo.preRegister(context);
// force the use of Messager as a Log
messager = Messager.instance0(context);
return new JavadocTool(context);
} catch (CompletionFailure ex) {
messager.error(Position.NOPOS, ex.getMessage());
return null;
}
}
where, for instance, JavadocEnter's preregister method is as follows:
public static void preRegister(Context context) {
context.put(enterKey, new Context.Factory<Enter>() {
public Enter make(Context c) {
return new JavadocEnter(c);
}
});
}
That is, a new factory for JavadocEnter is added onto the context. The crucial bit here is that the registration of the factory uses the same 'key' as the base Enter class (see above) - this means that whenever a client tries to do something like:
Enter.instance(context)
The body of the instance method will call get(enterKey) on the context, which will find the pre-registered factory and use that to create the Enter object, thus bypassing the constructor call. In other words, what you get back from the 'instance' call is a JavadocEnter instance, and not a plain Enter instance.
This context machinery has served us quite well in the past; it is a relatively fuss-free strategy to share components instances, and it is flexible enough (through keys, factories) that you can always bend it to do almost what you like - at the end of the day, at its core, it's just a map. However, as the compiler architecture became more complex, initialization order is starting to become an issue. Only recently we had two separate init issues:
https://bugs.openjdk.java.net/browse/JDK-8078261
https://bugs.openjdk.java.net/browse/JDK-8079335
And, in the past at least the following ones were being observed:
https://bugs.openjdk.java.net/browse/JDK-8048890
https://bugs.openjdk.java.net/browse/JDK-8071241
In the last one, it's remarkable how 90% of the review time has been spent trying to identify a better way to do lazy initialization in a way so that the context machinery would fall over. In other words, maintaining this infrastructure in place is starting to be costly.
What is this dependency problem? It turns out that there are two kinds of dependencies between components - hard dependencies and soft dependencies. You can say that A has an hard dependency on B if, in order to initialize A you need to do some stuff on B. Example (taken from Attr.java):
allowStaticInterfaceMethods = source.allowStaticInterfaceMethods();
Here, the constructor of Attr is accessing a member of 'source', which is another context class (obtained through an 'instance' call).
On the other hand, A has a soft dependency on B if it only needs B to cache it into a field somewhere - example (again from Attr):
infer = Infer.instance(context);
These cases are obviously very different; because of the way the context machinery work, what you get back from 'instance' could be a partially initialized class - consider the following dependency order:
Types -> Symtab -> ClassFinder -> ClassReader -> Types
Since the constructor of the above components are forming a cycle, the second time we hit Types.instance(context), we'll get back the partially initialized instance we created the first time around. Is this a problem? It depends. If ClassReader has only a soft-dependency on Types (i.e. wants to store it in a field), then it's ok. But if it has an hard dependency - i.e. needs to access to some members of Types, then we cannot guarantee that things will work correctly. If ClassReader is lucky, the bits it needs from Types are already initialized and all is well - but if it runs out of luck it could get an NPE while Types is trying to access some yet-to-be initialized fields.
Some analysis has been carried out with a tool (attached) which scans all langtools classes and emits a dot file representing _all_ dependencies (both soft and hard) between javac components. The results of these findings are also attached - dotted edges stand for soft dependencies - hard edged stand for hard dependencies.
It turns out that, if you take all dependencies together (both soft and hard) there is no topological order of the graph in which initialization is possible. The whole graph is a strongly connected component mess. You can get from anywhere to anywhere.
On the other hand, if we focus only on hard dependencies, the story is much more straightforward - not only the number of hard dependencies is significantly smaller than the one of soft dependencies (resulting in a much saner graph) - but the hard dependency graph is acyclic - meaning we can topo sort it and find the order in which components ought to be safely instantiated!
Which leads to a possible solution to the initializatoin woes. The proposed scheme moves away from Context being just a map - and it makes Context a factory of components, where a given component is keyed by its class object. You can subclass context and either return different components for the same key, or even add new components, if needs be. One key property of this scheme is that a component class only depends on its hard dependencies and the context itself. The component is allowed to use members of its hard dependencies, but it is not allowed to use the context for i.e. retrieving other components (if it does so, the initialization subsystem makes no guarantee on correctness). Therefore, all accesses on soft-dependencies is done through the context. Let's look at this example:
interface Component<C extends Context<C>> { }
abstract class Context<E extends Context<E>> {
@SuppressWarnings("unchecked")
abstract <Z extends Component<? extends E>> Z make(Class<Z> clazz);
}
class JavacContext extends Context<JavacContext> {
Options options;
Attr attr;
Enter enter;
//...
JavacContext() {
//init components in topo order
options = make(Options.class);
attr = make(Attr.class);
enter = make(Enter.class);
//...
}
@Override
@SuppressWarnings("unchecked")
<Z extends Component<? extends JavacContext>> Z make(Class<Z> clazz) {
if (clazz.equals(Attr.class)) {
return (Z)new Attr(options, this);
} else if (clazz.equals(Options.class)) {
return (Z) new Options(true);
} else if (clazz.equals(Enter.class)) {
return (Z) new Enter(options);
} else {
//...
throw new AssertionError("unexpected class");
}
}
}
class Options implements Component<JavacContext> {
boolean shouldDoFoo;
public Options(boolean shouldDoFoo) {
this.shouldDoFoo = shouldDoFoo;
}
boolean shouldDoFoo() { return shouldDoFoo; }
}
class Attr implements Component<JavacContext> {
final boolean shouldDoFoo;
final JavacContext context;
Attr(Options options, JavacContext context) {
//hard dependency
shouldDoFoo = options.shouldDoFoo();
this.context = context;
}
void enterType() {
//soft dependency
context.enter.doSomething();
}
}
class Enter implements Component<JavacContext> {
//no soft dependencies on any other components - no need for a context here
Enter(Options options) {
//...
}
void doSomething() { }
}
As you can see, creating a JavacContext will trigger initialization of all java-related components, in the right topo order (the one derived from the hard dependency graph). For instance, Attr is instantiated after Options, as Attr has an hard dependency on Options (same for Enter). Instantiation is done through the magic 'make' factory method - which is essentially a big switch. It's not very elegant, but it's effective - and more importantly, its behavior can be easily redefined - as shown in the example below:
class JavaDocContext extends JavacContext {
@Override
@SuppressWarnings("unchecked")
<Z extends Component<? extends JavacContext>> Z make(Class<Z> clazz) {
if (clazz.equals(Enter.class)) {
return (Z)new JavaDocEnter(options);
} else {
//delegate to super context
return super.make(clazz);
}
}
}
class JavadocEnter extends Enter {
public JavadocEnter(Options options) {
super(options);
}
}
Here, we create a context that is a subclass of the original JavacContext - in this case we want to return a slightly different instance for Enter (this example is taken from the real world, where we do something very similar).
I've been playing with this toy model a bit and I see the following advantages:
* it concentrates the initialization mess in a single place (Context constructor)
* it supports most (all?) of the use cases we find in real code
* creating a component is much easier - much less init boilerplate (no context key, no instance method, no list of fields pointing to other components)
* since a constructor of a component has explicit hard dependencies on its constructor parameters, the resulting model is easily verifiable by a static checker (i.e. to prevent introduction of cycles)
* it lends to automatization - i.e. with the right set of annotations, one could imagine the body of the JavacContext constructor to be written in a generic way (so that if you add a new component, that component is dynamically inserted in the right place in the init pipeline) - but that's an optional step - as I'm well aware that this magic has its pros and cons.
* static typing allows cast-free code in clients - i.e. Attr can always refer to Enter by accessing it in the context - as it knows that a JavacContext will have an 'enter' field.
* the resulting performance model is acceptable - i.e. take Attr above - its Context field is final - meaning that the VM knows it will always be bound to a given instance (i.e. JavacContext) after init. Which means all accesses/indirections can be inlined (if hot enough)
There are some question marks, obviously:
* does it handle all use cases? Plugins might be less free than they currently are? Do we care?
* clients will have to perform an indirection (through the context) to access components they want to access - is that too much?
* if a new subcontext introduces radically different components, it might have to copy and past most part of the init order in the base constructor. I.e. it's hard to add stuff in the 'middle'.
I think it would be nice to have some exploration work to assess if the above model is feasible.
(1) it allows sharing of component instances across the whole compiler lifetime span
(2) it allows other tools (i.e. javadoc) and clients (i.e. Netbeans) to 'override' certain components with more specific ones
Use case (1) is taken care by the fact that all component classes have a method like this:
public static Enter instance(Context context) {
Enter instance = context.get(enterKey);
if (instance == null)
instance = new Enter(context);
return instance;
}
So, if a client calls:
Enter.instance(context)
Either an Enter instance is already available in the context, in which case it is immediately returned, or it is not available and it has to be created. Typically, a component class will depend on several other component classes, so it is very typical for a constructor of a component class to call one or more 'instance' methods in other components. This can lead to problems (see below).
Use case (2) is taken care by subclassing + context factories; as we said above, a context can also map a component key to a component factory; for instance, JavadocTool does the following:
public static JavadocTool make0(Context context) {
Messager messager = null;
try {
// force the use of Javadoc's class finder
JavadocClassFinder.preRegister(context);
// force the use of Javadoc's own enter phase
JavadocEnter.preRegister(context);
// force the use of Javadoc's own member enter phase
JavadocMemberEnter.preRegister(context);
// force the use of Javadoc's own todo phase
JavadocTodo.preRegister(context);
// force the use of Messager as a Log
messager = Messager.instance0(context);
return new JavadocTool(context);
} catch (CompletionFailure ex) {
messager.error(Position.NOPOS, ex.getMessage());
return null;
}
}
where, for instance, JavadocEnter's preregister method is as follows:
public static void preRegister(Context context) {
context.put(enterKey, new Context.Factory<Enter>() {
public Enter make(Context c) {
return new JavadocEnter(c);
}
});
}
That is, a new factory for JavadocEnter is added onto the context. The crucial bit here is that the registration of the factory uses the same 'key' as the base Enter class (see above) - this means that whenever a client tries to do something like:
Enter.instance(context)
The body of the instance method will call get(enterKey) on the context, which will find the pre-registered factory and use that to create the Enter object, thus bypassing the constructor call. In other words, what you get back from the 'instance' call is a JavadocEnter instance, and not a plain Enter instance.
This context machinery has served us quite well in the past; it is a relatively fuss-free strategy to share components instances, and it is flexible enough (through keys, factories) that you can always bend it to do almost what you like - at the end of the day, at its core, it's just a map. However, as the compiler architecture became more complex, initialization order is starting to become an issue. Only recently we had two separate init issues:
https://bugs.openjdk.java.net/browse/JDK-8078261
https://bugs.openjdk.java.net/browse/JDK-8079335
And, in the past at least the following ones were being observed:
https://bugs.openjdk.java.net/browse/JDK-8048890
https://bugs.openjdk.java.net/browse/JDK-8071241
In the last one, it's remarkable how 90% of the review time has been spent trying to identify a better way to do lazy initialization in a way so that the context machinery would fall over. In other words, maintaining this infrastructure in place is starting to be costly.
What is this dependency problem? It turns out that there are two kinds of dependencies between components - hard dependencies and soft dependencies. You can say that A has an hard dependency on B if, in order to initialize A you need to do some stuff on B. Example (taken from Attr.java):
allowStaticInterfaceMethods = source.allowStaticInterfaceMethods();
Here, the constructor of Attr is accessing a member of 'source', which is another context class (obtained through an 'instance' call).
On the other hand, A has a soft dependency on B if it only needs B to cache it into a field somewhere - example (again from Attr):
infer = Infer.instance(context);
These cases are obviously very different; because of the way the context machinery work, what you get back from 'instance' could be a partially initialized class - consider the following dependency order:
Types -> Symtab -> ClassFinder -> ClassReader -> Types
Since the constructor of the above components are forming a cycle, the second time we hit Types.instance(context), we'll get back the partially initialized instance we created the first time around. Is this a problem? It depends. If ClassReader has only a soft-dependency on Types (i.e. wants to store it in a field), then it's ok. But if it has an hard dependency - i.e. needs to access to some members of Types, then we cannot guarantee that things will work correctly. If ClassReader is lucky, the bits it needs from Types are already initialized and all is well - but if it runs out of luck it could get an NPE while Types is trying to access some yet-to-be initialized fields.
Some analysis has been carried out with a tool (attached) which scans all langtools classes and emits a dot file representing _all_ dependencies (both soft and hard) between javac components. The results of these findings are also attached - dotted edges stand for soft dependencies - hard edged stand for hard dependencies.
It turns out that, if you take all dependencies together (both soft and hard) there is no topological order of the graph in which initialization is possible. The whole graph is a strongly connected component mess. You can get from anywhere to anywhere.
On the other hand, if we focus only on hard dependencies, the story is much more straightforward - not only the number of hard dependencies is significantly smaller than the one of soft dependencies (resulting in a much saner graph) - but the hard dependency graph is acyclic - meaning we can topo sort it and find the order in which components ought to be safely instantiated!
Which leads to a possible solution to the initializatoin woes. The proposed scheme moves away from Context being just a map - and it makes Context a factory of components, where a given component is keyed by its class object. You can subclass context and either return different components for the same key, or even add new components, if needs be. One key property of this scheme is that a component class only depends on its hard dependencies and the context itself. The component is allowed to use members of its hard dependencies, but it is not allowed to use the context for i.e. retrieving other components (if it does so, the initialization subsystem makes no guarantee on correctness). Therefore, all accesses on soft-dependencies is done through the context. Let's look at this example:
interface Component<C extends Context<C>> { }
abstract class Context<E extends Context<E>> {
@SuppressWarnings("unchecked")
abstract <Z extends Component<? extends E>> Z make(Class<Z> clazz);
}
class JavacContext extends Context<JavacContext> {
Options options;
Attr attr;
Enter enter;
//...
JavacContext() {
//init components in topo order
options = make(Options.class);
attr = make(Attr.class);
enter = make(Enter.class);
//...
}
@Override
@SuppressWarnings("unchecked")
<Z extends Component<? extends JavacContext>> Z make(Class<Z> clazz) {
if (clazz.equals(Attr.class)) {
return (Z)new Attr(options, this);
} else if (clazz.equals(Options.class)) {
return (Z) new Options(true);
} else if (clazz.equals(Enter.class)) {
return (Z) new Enter(options);
} else {
//...
throw new AssertionError("unexpected class");
}
}
}
class Options implements Component<JavacContext> {
boolean shouldDoFoo;
public Options(boolean shouldDoFoo) {
this.shouldDoFoo = shouldDoFoo;
}
boolean shouldDoFoo() { return shouldDoFoo; }
}
class Attr implements Component<JavacContext> {
final boolean shouldDoFoo;
final JavacContext context;
Attr(Options options, JavacContext context) {
//hard dependency
shouldDoFoo = options.shouldDoFoo();
this.context = context;
}
void enterType() {
//soft dependency
context.enter.doSomething();
}
}
class Enter implements Component<JavacContext> {
//no soft dependencies on any other components - no need for a context here
Enter(Options options) {
//...
}
void doSomething() { }
}
As you can see, creating a JavacContext will trigger initialization of all java-related components, in the right topo order (the one derived from the hard dependency graph). For instance, Attr is instantiated after Options, as Attr has an hard dependency on Options (same for Enter). Instantiation is done through the magic 'make' factory method - which is essentially a big switch. It's not very elegant, but it's effective - and more importantly, its behavior can be easily redefined - as shown in the example below:
class JavaDocContext extends JavacContext {
@Override
@SuppressWarnings("unchecked")
<Z extends Component<? extends JavacContext>> Z make(Class<Z> clazz) {
if (clazz.equals(Enter.class)) {
return (Z)new JavaDocEnter(options);
} else {
//delegate to super context
return super.make(clazz);
}
}
}
class JavadocEnter extends Enter {
public JavadocEnter(Options options) {
super(options);
}
}
Here, we create a context that is a subclass of the original JavacContext - in this case we want to return a slightly different instance for Enter (this example is taken from the real world, where we do something very similar).
I've been playing with this toy model a bit and I see the following advantages:
* it concentrates the initialization mess in a single place (Context constructor)
* it supports most (all?) of the use cases we find in real code
* creating a component is much easier - much less init boilerplate (no context key, no instance method, no list of fields pointing to other components)
* since a constructor of a component has explicit hard dependencies on its constructor parameters, the resulting model is easily verifiable by a static checker (i.e. to prevent introduction of cycles)
* it lends to automatization - i.e. with the right set of annotations, one could imagine the body of the JavacContext constructor to be written in a generic way (so that if you add a new component, that component is dynamically inserted in the right place in the init pipeline) - but that's an optional step - as I'm well aware that this magic has its pros and cons.
* static typing allows cast-free code in clients - i.e. Attr can always refer to Enter by accessing it in the context - as it knows that a JavacContext will have an 'enter' field.
* the resulting performance model is acceptable - i.e. take Attr above - its Context field is final - meaning that the VM knows it will always be bound to a given instance (i.e. JavacContext) after init. Which means all accesses/indirections can be inlined (if hot enough)
There are some question marks, obviously:
* does it handle all use cases? Plugins might be less free than they currently are? Do we care?
* clients will have to perform an indirection (through the context) to access components they want to access - is that too much?
* if a new subcontext introduces radically different components, it might have to copy and past most part of the init order in the base constructor. I.e. it's hard to add stuff in the 'middle'.
I think it would be nice to have some exploration work to assess if the above model is feasible.