Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8315737

(DRAFT) Loaded Classes in CDS Archives

    XMLWordPrintable

Details

    • iklam
    • Feature
    • Open
    • Implementation

    Description

      Summary

      Store classes in the CDS archive in a loaded state (not merely parsed), thereby reducing time spent resolving symbolic class references during startup.

      Goals

      • Enable "live" class data to be adopted from CDS directly into the running application.

      • Simplify future adoption of additional CDS assets (metaspace, Java heap, code cache), by stabilizing addresses within the CDS archive.

      Non-Goals

      • The support of other types of class loaders is not included in the JEP, but could be covered in future work. This JEP supports only the three main ClassLoaders: bootstrap, platform, and system.

      • The behavior described in this JEP is not enabled by default: The user must opt in with a VM flag.

      • This JEP does not develop the full potential of adopting metaspace, Java heap, and code cache assets from CDS; it merely advances this process by making adopted CDS data more directly useful.

      Success Metrics

      • Measurable startup time improvements, due to shifting of more class loading work.

      • Stabilized data pointers enable new Leyden computation-shifting optimizations in later work.

      Motivation

      This JEP is a part of the Project Leyden, which aims to improve the startup time of Java applications. This JEP provides a stable snapshot (see below) of class data for selected classes used by a Java application. Stable snapshots will provide some improvements in startup time.

      Stable class snapshots will also allow delivery of additional Leyden optimizations (which are currently in prototype only). These follow-on optimizations are expected to double down on existing CDS capabilities to populate the class metdata heap and Java object heap from the CDS archive, using immediately adoptable data stored in the archive. They are also likely to place new kinds of computed states into CDS, such as constant pool resolution results, optimized code (AOT), lambda classes, and many other items currently computed at application startup. The present JEP is a necessary building block for all of these techniques.

      Description

      New markings in the CDS file will instruct the VM to create classes (from CDS memory data in an almost-loaded state) immediately on startup, rather than when the application requests them. Some precomputed resolution states, such as resolved references from subclasses to superclasses, will be immediately adopted from the CDS file as well. As is currently the case today, a few selected classes will be initialized in the CDS archive as well, although this will not be visible to users.

      Background

      The goal of Project Leyden is to improve the startup time, time to peak performance, and footprint of Java programs, by means of selectively shifting and constraining computation.

      In addition to control and configuration information, a CDS archive contains memory-mappable images of data assets ready to adopt into the running VM. Once adopted, these assets function exactly the same as the online results of dynamic class creation. In particular, a CDS archive can contribute memory segments both to HotSpot metaspace (where class and method metadata resides) and also to the Java heap (where Java objects associated with classes reside). A future JEP will allow a CDS archive to contribute code assets, which are mapped into the HotSpot code cache (a heap where JIT output is stored).

      A significant portion of startup time (in almost any Java application) is spent in loading and organizing blocks of class data, and then linking those blocks together into a fully configured application. The existing HotSpot CDS technology shifts some startup computations to application build time, by performing classfile parsing during a training run, then storing the resulting parsed class data as assets in a CDS archive.

      This is a start and there is more to do. Generally speaking, more complex and expensive shifted startup computations confer larger benefits to an application, if they can be shifted to build time, so that the VM can directly "adopt" the archived computation states when the application starts up. This is presently true for parsing of class files, and becomes true for additional phases of class loading and resolution in this JEP.

      (In future JEPs, the VM may adopt additional resolution, initialization, and compilation states stored in CDS, and this JEP is a necessary starting point.)

      Existing CDS technology makes class data structures available to the VM in a provisional (or "pickled") parsed state, which needs additional processing in order to be loaded as "live" class. Provisional structures are impossible to use directly. The VM adopts provisional class data from CDS when it detects a runtime request to load a suitably matching classfile. But pointers to pickled structures are "dead" addresses in the (mapped) CDS archive. They cannot be immediately referenced (say, by AOT code or in the Java heap) until the classes are actually loaded. Because this currently happens only on application request, the VM cannot link CDS assets together with live pointers. That is, you can't refer to the class directly (by its pointer) until the VM or Java application loads it.

      The problem is that the "dead" address is not directly usable until a permanent decision has been made to adopt the class data block into the VM as a real class object. Before the decision is made, a "dead" address must be treated as unstable and provisional, not reliable. After the decision, the address can be treated as "live", stable, reliable part of the VM's online class metadata.

      Current Changes

      This JEP ensures that (selected) classes from the classpath are given "live" addresses immediately on startup. The normal request to load (and perhaps initialize) a class, which is normally an early side effect of application's main routine, is "short-circuited" and replaced by an even earlier side effect (a shifted computation, in Leyden terms). Class initialization per se will not be short-circuited in this JEP.

      It is as if the class were loaded by platform code, not application code, and this platform code is executed before the application main routine was entered. We call this very early period the premain phase of execution. From the application point of view, loading happened during the premain phase, a long time before main was invoked. Thus, no further loading activity delays the execution of main itself. From the user point of view, the loading happens in a training run which emits a CDS archive.

      From the VM point of view, the training run creates directly usable memory images representing loaded classes, and stores them in the CDS archive for quick adoption, before main runs. And within the context of the Java VM specification, it is as if the class initializer for Object instigates the loading of classes listed for early loading in the CDS archive. This is old behavior, because the VM has always loaded and initialized selected JDK classes on startup, before main is executed. The change in this JEP is that the CDS archive will be allowed to nominate additional classes for early processing during the premain phase, by providing their loaded images in the metaspace area of the CDS archive.

      The specific shiftable computations associated with bootstrapping some class class C are as follows:

      • (a) Locating the classfile of C by its name and ClassLoader.

      • (b) Parsing this classfile into an executable representation of C, an InstanceKlass internal to the VM.

      • (c) Adding the representation of C (and of all of its supertypes) to each relevant ClassLoader.

      • (d) Linking some of the constant pool entries of C to other loaded classes, as appropriate.

      As of JDK 21, CDS already supports the shifting of (a) and (b): The CDS archive stores pre-parsed InstanceKlass structures for Java classes encountered at build time. As the Java program executes, each InstanceKlass can be loaded quickly from the CDS archive and added into each relevant ClassLoader. This JEP shifts the later steps as well.

      Implementation

      This JEP addes a new VM command-line option to be used with the -Xshared:dump option. The flag -XX:+PreloadClasses (BUT SEE FIXME) will instruct the CDS archive to include the results of step (c) as well. CDS may perform step (d) as well, based on special knowledge of classes in java.base, but configurably by any flag. These steps involve not only parsing but also loading (which means permanent definition to the VM), linking, and preparation. The pre-existing option -XX:SharedClassListFile=... will provide fine control over which classes are loaded in this way. No additional flag is required at application run time.

      (FIXME: Should we change Shared to Stored or CDS. CDS is now Cached Data Store, and CDS can be the common modifier: -XX:+CDSLoadClasses, -XX:CDSClassListFile=.... The "pre-" prefix is overused.)

      A CDS archive created with -XX:+PreloadClasses will have the following behavior:

      • Classes stored in this archive for early loading will be formatted in their "live" (online) states, directly in the CDS metaspace heap, for faster adoption at startup.

      • Early during startup (before main), such classes will be automatically added to the appropriate ClassLoader at program startup. (This class loader will be the boot, platform, or system ClassLoader.)

      • The addresses of these classes will be predictable and stable, as a function of archive base address. For this reason, the CDS archive will be able to cross-link the results of symbolic resolution between early-loaded classes, reducing link time at startup.

      These classes should come from only "known" locations, such as the JDK's modules file, the module path, or the classpath. Note that CDS doesn't store classes that are dynamically generated or loaded from other locations, although it can (in principle) store "hidden classes" (required by lambdas) when they are generated at build time and attached to constant pool states.

      As a result, the application will no longer be able to do the following at runtime:

      • Define an alternative version of a named class using MethodHandles.Lookup::defineClass(). Such calls will result in a LinkageError with a "attempted duplicate class definition" message.

      • The classfile load hook will not be executed for a class which is already loaded at program startup. Therefore, a Java Instrumentation Agent will not be able to transform the bytecodes of such a class. (Redefinition continues to be an option.)

      • A class loaded in CDS cannot be unloaded, and must be made present to the application even if the application (as the result of its dynamic behavior) does not actually ask for that particular class. The classpath may be appended (at the end) but not otherwise changed. These limitations are already present in CDS, and are also inherent to the system class loaders supported by this JEP. JVMTI class redefinition (used by debuggers) should continue to work.

      Consequences

      Although this is a simple change, the benefits are deep and go far beyond faster class loading. Internal pointers to eagerly loaded class data, adopted from the CDS archive, will be immediately and unconditionally useful as "live" VM data, without waiting for resolution logic or other checks.

      Once data is "live from the start" in this way, the VM can use it immediately. This is true of all kinds of data assets used by the JVM, such as metadata (e.g., parsed classes) and Java objects (e.g., class mirrors).

      In addition, CDS assets (such as other classes, Java Class mirrors or eventually AOT code) can refer directly to class data by using pointers, instead of via complex provisional symbolic references or relocation records. When they are adopted into the VM, such mutually referential CDS assets are correctly configured with respect to each other immediately and without extra checks or resolution steps.

      At most, a low-level relocation pass may be required, as is typical when mapping dynamically linked data that contains pointer. For CDS, this pass is extremely simple, being driven by a bitmap which shows where pointers occur in the mapped CDS assets.

      In particular, the stability of class data pointers implies that constant pool entries can (in the future) be put into a resolved state when a CDS archive is created, and the VM does not need to re-resolve these states. This will allow the VM to avoid many dynamic operations normally required when a Java application configures itself at startup.

      Stabilized, less speculative, immediately usable pointers are likely to greatly simplify the management of AOT code, in future JEPs.

      Compatibility Issues

      Advanced features like user-defined class loaders, reflective class definition, and bytecode rewriting will not be helped by this JEP. If they are to be applied to some class C, that class C must also be loaded directly from CDS.

      We believe very few programs need to use MethodHandles.Lookup::defineClass() to redefine classes from JDK's modules file, the module path, or the classpath. Such applications should not use the -XX:+PreloadClasses option. Hidden classes defined at build time, like those from the lambda metafactory, will be fine.

      If the VM is started with a Java Instrumentation Agent that has the capability of transforming the bytecodes of Java classes, the VM will refuse to use any CDS archive that were created with the -XX:+PreloadClasses option. All Java classes will be incrementally loaded from classfiles. The application will be able to interoperate with the Java Instrumentation Agent, but its startup time will be slower than would be possible when the CDS archive is used.

      In the future, we may allow Java Instrumentation Agents to be used when the CDS archive is created. That would allow the bytecode transformation to be shifted to application build time.

      Alternatives

      We could keep the current techniques and try to improve them more conservatively. The problems with current CDS are:

      • At the start of a Java program, only a very small set of classes are loaded. Most of class creation and resolution (steps (c) and (d) above) happens incrementally as the side effect of program execution. Thus, when a class K is required by the program but K is not yet loaded, the program will trap into the VM to load the class into its ClassLoader. Such traps can significantly slow down the application.

      • The incremental loading of classes is problematic for AOT-compilation. Compilers inside the VM, such as C1 or C2, can only generate code for classes that are already loaded. If we want to reuse these compilers to generate AOT code, they would need to emit extra guard code to check and load classes on demand. This will require more development efforts and result in slow and bloated compiled methods, and a less robust VM.

      • It's possible for the application to load dynamically configured alternative versions of its classes (e.g., using MethodHandles.Lookup::defineClass(), or with bytecode instrumentation). This means AOT-compiled methods would need to check for such changes in the classes.

      • To shift computations done inside Java code, we need the ability to store the results of such computations in the form of Java heap objects. If the application can load alternative versions of its classes, the memory layout of the stored results will no longer be valid.

      In summary, the current model of incremental class loading is not just slow, but it also creates many challenges for future Leyden optimizations. While it may be possible to implement future Leyden optimizations with runtime checks to work with incremental class loading, it would be much easier and safer to provide a stable snapshot of classes in their loaded states. That is, we can make the adoption of CDS assets more simple and robust, by shifting step (c) to application build time. This applies to both existing CDS assets (metaspace structures, Java heap data) and future CDS assets (AOT code).

      We could try to shift additional states and assets that refer to loaded classes (such as AOT and heap snapshots), but try to make their adoption less definite and more speculative, depending on whether the classes are loaded or not. This would require many more runtime checks to ensure that classes loaded at runtime are the same as those used during time-shifted operations. However, such checks are likely to be complex and costly, and reduce the quality of the VM.

      Testing

      • We will create new unit test cases that cover specific behavior of the -XX:+PreloadClasses option. For example, how does this option interact with serviceability agents
      • The -XX:+PreloadClasses option is mostly orthogonal to the existing CDS features. Therefore, we can use run existing CDS test cases with this option explicitly enabled. Such test cases should still pass.

      Risks and Assumptions

      We assume that many application programmers (and the frameworks they use) will be content to configure their application class paths at build time, and not reconfigure them (incompatibly) at application deployment time. Early customer conversations suggest this will often work out fine. We assume that if an application requires radical class reconfiguration at runtime, but uses this technology erroneously, that the failure modes will make the root cause easy to find and fix, and (moreover) will be rare in practice.

      Dependencies

      This JEP is an evolution of the existing CDS implementation. Future JEPs are likely to depend on it, as it enables whole suites of classes to be loaded all at once, with their interdependencies already resolved.

      Attachments

        Activity

          People

            iklam Ioi Lam
            iklam Ioi Lam
            Ioi Lam Ioi Lam
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: