Loading...

Type: JEP
Resolution: Unresolved
Priority: P4
Fix Version/s: None
Component/s: hotspot
Labels:
- cds
- gc
- leyden
- zgc

JEP Type:
Feature
Exposure:
Open
Subcomponent:
gc
Scope:
JDK
Discussion:
hotspot dash dev at openjdk dot org
Effort:
M
Duration:
M
JEP Number:
516

Summary

Enhance the ahead-of-time cache, which enables the HotSpot Java Virtual Machine to improve startup and warmup time, so that it can be used with any garbage collector, including the low-latency Z Garbage Collector (ZGC). Achieve this by making it possible to load cached Java objects sequentially into memory from a neutral, GC-agnostic format, rather than map them directly into memory in a GC-specific format.

Goals

Allow all garbage collectors to work smoothly with the AOT cache introduced by Project Leyden.
Separate the AOT cache from GC implementation details and policies.
Ensure that use of the AOT cache does not materially impact startup time, relative to previous releases.

Motivation

Most of the HotSpot JVM's garbage collectors pause application threads in order to reclaim memory. This causes applications to take significantly longer than usual to handle some requests, increasing their _blank" rel="nofollow noopener" data-shared-secret="1753260476654-0.40412470765003183">tail latency. For example, 99% of all requests may be handled within 10ms, but 1% of the requests may take 100ms or more. You can minimize the tail latency caused by garbage collection by using the Z Garbage Collector (ZGC). ZGC reclaims memory concurrently, never pausing application threads for more than a millisecond.

Garbage collection is, however, not the only cause of high tail latency.

Java applications are often scaled by starting new JVM instances to handle more requests, but requests sent to a new instance take significantly longer than requests sent to a warmed-up instance. To address this source of tail latency, you can enable ahead-of-time class loading and linking, introduced in JDK 24. This improves application startup by caching your application’s classes in a training run so that they are available immediately in production. For example, the Spring PetClinic demo application starts 41% more quickly in production because the cache enables some 21,000 classes to appear already loaded and linked when the application starts. Forthcoming features, such as ahead-of-time method profiling and code compilation, will further leverage the ahead-of-time cache to further extend these gains.

Unfortunately, the way that classes and other Java objects are cached is incompatible with ZGC. This forces you to choose between suffering GC-induced tail latency or suffering startup-induced tail latency. If you use ZGC to reduce the former then you cannot enable ahead-of-time class loading and linking to reduce the latter, and vice versa.

We could avoid this painful choice if AOT caches could be used with any of the HotSpot JVM's garbage collectors, including ZGC.

Description

An AOT cache contains, among other things, representations of the Java Class objects for classes that were loaded and linked during the training run. It also contains Java objects referenced by those Class objects, such as strings and byte arrays.

Today, cached Java objects are stored in a GC-specific format which is bitwise-compatible with the format of heap objects as understood by the GC. This enables the JVM to map them directly into the heap memory managed by the GC. (The other data in AOT cache files is not GC-specific.)

We propose to, optionally, cache Java objects in a neutral, GC-agnostic format that works with all garbage collectors, regardless of which GC is used in training or in production. As an additional benefit, this will allow the JDK to include a baseline AOT cache that works in all environments.

Obstacles to GC-agnostic object caching

The main challenge of caching objects in a GC-agnostic manner is in how to handle object references. From the perspective of Java code, the value of a field that holds a reference to an object is opaque. From the perspective of the JVM, however, each GC has its own policies for laying out objects in memory and representing references from one object to another:

Heap size policies (Serial, Parallel, G1) — For heaps larger than 32 GB, object references are represented as 64-bit addresses and stored directly in reference fields. For heaps smaller than 32 GB, object references are stored in reference fields as 32-bit values, using compression if necessary since addresses can have up to 35 bits. There are three compression schemes, selected heuristically at run time based on heap size and other factors.
Object size policies (G1, ZGC) — Objects are placed within heap regions, according to their sizes. In G1, the high-order bits of a 64-bit address identify the region, the low-order bits encode an offset into the region, and objects never cross region boundaries. An object that G1 considers large gets its own exclusive heap region, and any reference to the object must have all zero low-order bits. ZGC, on the other hand, distinguishes between small, medium, and large objects, using a different reference format for each.
Metadata — ZGC encodes metadata bits in object references. These bits are used to manage concurrent garbage collection. No other GC supports this reference format.

The multitude of reference formats makes it challenging to take objects managed by one GC, cache them, and reify them later for a different GC.

Object caching today

The representation of a Java object in an AOT cache mirrors its representation in memory. For example, consider a String object, which has these fields:

public class String {
    private final byte[] value;
    private final byte coder;
    private int hash;
    private boolean hashIsZero;
}

In the cached form of a String object, the value field contains the 64-bit memory address of a byte array:

header: ...  |  value: 0x4002045278  |  coder: ...  |  hash: ...  |  hashIsZero: ...

The address is in a lowest-common-denominator format that is valid across the Serial, Parallel, and G1 collectors. Objects are stored in AOT caches such that none crosses the boundaries of heap regions, using a predetermined region size. This allows you to run in production with G1 even if you trained with Serial or Parallel.

ZGC does not use 64-bit addresses as object references, however, and it does not support a global size for regions. Hence ZGC cannot be used with AOT caches.

GC-agnostic object caching

We cache Java objects in a GC-agnostic manner by storing object references in a GC-agnostic format, namely as logical indices. In a String object cached in this format, the value field contains the logical index of the byte array:

header: ...  |  value: 5  |  coder: ...  |  hash: ...  |  hashIsZero: ...

Using objects cached in this format requires converting the logical indices back into memory addresses. The JVM therefore reads objects from the cache sequentially, i.e., streams them, into memory. When the cache is opened, a background thread eagerly starts materializing objects, one by one. Materializing an object involves allocating memory in the heap, initializing the object's fields according to the data in the cache, and building object references to other materialized objects via lookups in a side table. When the application uses a class for the first time, it synchronizes with the background thread to ensure that the Class object for the class, and any related objects, are materialized. (The other data in the cache continues to be mapped into memory, as it is today.)

Choosing GC-specific vs. GC-agnostic object caching

GC-specific cached objects are mapped directly into memory, while GC-agnostic cached objects are streamed into memory. Both create the appearance of instantly-loaded objects, but in some scenarios mapping performs better than streaming — and vice versa.

A cold start of an application is the first start of that application on a particular machine in a while. Cold starts can happen frequently when deploying applications in a cloud. The AOT cache is unlikely to be in the filesystem cache, and the larger the cache, the larger the cost of loading it from disk. Streaming GC-agnostic cached objects can hide some of the latency of reading data from the disk, at the cost of requiring an additional CPU core.

Conversely, a warm start is when an application starts close in time to a previous start, such as when running over and over on the same machine. Because the AOT cache stays in the filesystem cache between runs, mapping GC-specific cached objects can be done instantly.

The least advantageous situation for streamable, GC-agnostic object caching is a warm start in a constrained environment that does not have a spare CPU core. The JVM tries to avoid this situation in production by applying a heuristic when creating an AOT cache after a training run:

It caches objects in the streamable, GC-agnostic format if, in training, either ZGC was used, or the -XX:-CompressedOops option was used, or the heap was larger than 32 GB. Training with ZGC, compressed pointers, or a large heap implies that the training environment was large, with more than a single core available. We assume that the production environment is similarly unconstrained, meaning that streaming will be most effective.
It caches objects in the mappable, GC-specific format if, in training, the -XX:+UseCompressedOops option was used. This option indicates that the training environment had a heap smaller than 32 GB and did not use ZGC. This implies that the training environment was a constrained system without a spare core. We assume that the production environment is similar, meaning that mapping will be most effective.

You can explicitly create a cache whose objects are in the streamable, GC-agnostic format by specifying -XX:+AOTStreamableObjects, even if you also specify -XX:+UseCompressedOops.

The JDK includes two baseline AOT caches, one with GC-agnostic cached objects and one with GC-specific cached objects, which the JVM uses when the application does not provide a cache. This ensures that the JVM can use streaming or mapping, as appropriate, to achieve the best startup performance.

Alternatives

Enabling ZGC to support AOT caches does not require a GC-agnostic solution. We could, instead, continue the GC-specific approach by creating ZGC-specific caches containing ZGC-specific object references. This would optimize startup performance. However, the GC-agnostic solution, with objects materialized in the background, does not materially affect startup performance as long as an extra core is available, so the only situation in which a ZGC-specific cache would perform better is when using ZGC on a single-core machine. This is an unusual environment for the highly-concurrent ZGC, and thus does not motivate creating ZGC-specific caches. We prefer to rely on the maxim that the best way to reduce tail latency is with a systems approach, where the design of discrete components is coordinated, which leads us to the GC-agnostic approach.
We could modify ZGC so it can interpret both its own object references and the G1-influenced object references currently found in AOT caches. The Serial and Parallel GCs were modified in this way, but ZGC is significantly more complex. This approach would effectively couple the implementations of all the GCs to each other, which is undesirable. In contrast, the GC-agnostic approach decouples the implementations, allowing GC implementations to evolve while allowing you to choose from the full range of GCs in training and again in production. Furthermore, since the bitwise layout of GC-agnostic cached objects is not entangled with the memory layout of objects in the heap, we expect to be able to optimize the layout of the cache to shrink its static footprint without significantly affecting GC implementations.

Testing

Many object-caching tests already exist. We will adapt them to test with ZGC and the new streaming, GC-agnostic approach.

duplicates

JDK-8274789 Support archived heap objects in ZGC

Closed

JDK-8242315 Execute patch_archived_heap_embedded_pointers in a GC thread

Closed

JDK-8310823 CDS archived object streaming

Closed

relates to

JDK-8328886 Lilliput: Build COH archives

Resolved

JDK-8308854 G1 archive region allocation may expand/shrink the heap above/below -Xms

Resolved

Details

Description

Summary

Goals

Motivation

Description

Obstacles to GC-agnostic object caching

Object caching today

GC-agnostic object caching

Choosing GC-specific vs. GC-agnostic object caching

Alternatives

Testing

Attachments

Issue Links

Activity

People

Dates