Fix Version/s: None
Enable Java compilers to use novel code generation strategies (intrinsification) in order to improve the performance of certain Java SE methods.
In modern JVM implementations, Just-In-Time (JIT) compilers do an excellent job of optimizing bytecode at run time. A considerable amount of bytecode is "clerical" in nature -- shuffling data from the stack to the heap and back again -- and can be optimized with techniques such as box elimination and method inlining. However, there are limits to the analysis that a JIT compiler can perform in a reasonable time and space, so it might miss some opportunities for optimization. Unfortunately, the way that method invocations in source code are compiled to bytecode tends to increase the chances of a miss.
For example, consider an invocation of the method
String::format (API. The first argument is a format string such as
%s %d, followed by varargs of any type. A Java compiler generates bytecode that boxes primitive varargs, creates an array, initializes it, and invokes the method; the bytecode of the method's body reverses these steps to obtain values to interpolate according to the format string. Unfortunately, the method's body is too large to inline, so the JIT compiler cannot eliminate the boxing-and-unboxing of primitive varargs, nor the shuffling of varargs into an array and out again. Even more unfortunately, the format string is usually a constant expression, so without inlining it will be parsed every time the method's body runs.
String::format is important because it is a concise and reliable way to implement
toString. However, some developers shy away from using it purely out of performance considerations, and instead use more verbose and error-prone mechanisms. By optimizing the invocation of
String::format, the most readable and maintainable way to implement
toString also becomes the most performant way.
JEP 280 replaced the translation of string concatenation with
invokedynamic, resulting in faster bytecode, less allocation churn, and more uniform optimizability. We can apply the same technique to
String::format (and closely related methods such as
java.util.Formatter::format) by compiling the invocation using an alternate translation strategy that customizes the bytecode for each specific invocation based on information available at compile time, such as the static types and values of the actual arguments.
Enable JDK developers to (i) tag methods as candidates for intrinsification by a Java compiler, and (ii) for those candidate methods, implement alternate translations of invocations that result in behavior which conforms to the specification of the method.
It is not a goal to allow intrinsification of methods declared outside the core Java SE modules.
Traditionally, a Java compiler translates a method invocation in source code to one of the bytecodes
invokestatic. This JEP allows the compiler to use an alternate translation when certain designated methods of the Java SE API are invoked. The use of an alternate translation is called intrinsification; the invocation is said to be intrinsified.
For the compiler to intrinsify a specific invocation of a given method, all of the following have to happen:
- The method opts in to intrinsification at its declaration site, as part of its specification;
- The compiler identifies this invocation as intrinsifiable;
- The compiler knows of an intrinsic processor for the method;
- The intrinsic processor indicates an alternate translation strategy; and
- The compiler generates the bytecode corresponding to the indicated strategy.
Opting in to intrinsification
For a method of the Java SE API to opt in to intrinsification, it must be designated as an intrinsic candidate, via the annotation
@IntrinsicCandidate. A compiler can thus recognize an invocation of such a method as intrinsifiable, and may (but is not required to) delegate the translation decision to an intrinsic processor.
The space of methods that can opt in to intrinsification is restricted, out of an abundance of concern for the broad impact of generating novel bytecode. Only a method exported by the
java.base module may be designated as an intrinsic candidate, and only if it is either (i) an instance method in a
final class, or (ii) a
static method, so that the compiler can be sure of its behavior. Designating any other method as an intrinsic candidate will be ignored.
(It might seem that a
final instance method in a non-
final class is suitable, but the body of such a method may invoke non-
final instance methods in the same class; those methods may be overridden at runtime, so the behavior of the
final instance method is not sufficiently predictable for intrinsification. Even less predictable is the behavior of a non-
final method in a non-
final class, which is why
java.io.PrintStream::format is not mentioned in this JEP despite its clear similarities with
The annotation type
IntrinsicCandidate is part of the Java SE API, and is meta-annotated with
@Documented to flag the significance of applying the annotation.
A Java compiler may provide a mechanism for the discovery of intrinsic processors. An intrinsic processor specifies which method or methods it is able to process; if no intrinsic processor for a given method is known to the compiler, then invocations of that method are not intrinsified. For predictability, all intrinsic processors are disabled by default, and may be enabled with the
javac command-line option
-XDintrinsify=all. If no alternate translation is indicated to the compiler by an intrinsic processor, or if the compiler decides to ignore such an indication, then it must generate bytecode according to JLS 15.12.3.
Generation of alternate bytecode
An intrinsic processor may indicate an alternate translation for a specific invocation of a given method, e.g., replace with
invokedynamic using a given bootstrap, replace with another method call, replace with a constant load, etc. The compiler may then generate precise bytecode for that translation, rather than the traditional bytecode.
Let's analyze the benefits of intrinsifying
String::format to avoid the boxing overhead, varargs overhead, and the repeated analysis of constant format specifiers (the first argument). Consider the following invocation:
String name = ... int age = ... String s = String.format("%s: %d", name, age);
Traditionally, this results in boxing
age to an
Integer, allocating a varargs array, storing
name and the boxed
age into the varargs array, and then parsing and interpreting the format string -- on every invocation. The bytecode is lengthy:
0: ldc #2 // String John 2: astore_1 3: bipush 30 5: istore_2 6: ldc #3 // String %s: %d 8: iconst_2 9: anewarray #4 // class java/lang/Object 12: dup 13: iconst_0 14: aload_1 15: aastore 16: dup 17: iconst_1 18: iload_2 19: invokestatic #5 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer; 22: aastore 23: invokestatic #6 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String; 26: astore_3 27: return
When the format specifier is constant, which it almost always is, an intrinsic processor can select an alternate translation: (note that neither
age need to be constant variables)
String s = name + ": " + Integer.toString(age);
Given this translation, the compiler can optimize it to an
invokedynamic using the mechanics of JEP 280, resulting in the following bytecode:
0: ldc #2 // String John 2: astore_1 3: bipush 30 5: istore_2 6: aload_1 7: iload_2 8: invokedynamic #3, 0 // InvokeDynamic #0:format:(Ljava/lang/String;I)Ljava/lang/String; 13: astore_3 14: return
As well as the evident simplification, this bytecode runs between 30 and 50 times faster than traditional bytecode.
Risks and Assumptions
If not properly implemented, the alternate translation may not be perfectly behaviorally compatible with the specification or original implementation.
Even if properly implemented, an alternate implementation may not properly track changes made to the original implementation in the future.
Even if properly implemented and tracked, the maintenance of intrinsic candidate methods and their alternate translations is made more difficult, since changes may need to be made in two places and must be behaviorally identical.
There is no guarantee that the performance of an alternate implementation will be superior, for every execution of every program on every machine, to the performance that would have been achieved by the original implementation.
(As an example of the difficulties of predicting performance, consider the
Objects::hash method. An earlier version of this JEP praised
Objects::hash for similar reasons to
String::format, namely that
Objects::hash is a concise and reliable way to implement
Objects::hash has a similar signature) to
String::format, so the bytecode generated for its invocation has the same performance problems as for
String::format. However, the semantics of hashing and string formatting are quite different, and experiments showed the performance gains from intrinsifying
Objects::hash to be far less than the gains from intrinsifying
String::format. The gains were also far more sensitive to the number and values of actual arguments. Consequently, the efforts to intrinsify
Objects::hash were discontinued.)