-
JEP
-
Resolution: Unresolved
-
P4
-
Roman Kennke
-
Feature
-
Open
-
Implementation
-
-
L
-
L
-
450
Summary
Reduce the size of object headers in the HotSpot JVM from between 96 and 128 bits down to 64 bits on 64-bit architectures. This will reduce heap size, improve deployment density, and increase data locality.
Goals
When enabled, this feature
- Must reduce the object header size to 64 bits (8 bytes) on the target 64-bit platforms (x64 and AArch64),
- Should reduce object sizes and memory footprint on realistic workloads,
- Should not introduce more than 5% throughput or latency overheads on the target 64-bit platforms, and only in infrequent cases, and
- Should not introduce measurable throughput or latency overheads on non-target 64-bit platforms.
When disabled, this feature
- Must retain the original object header layout and object sizes on all platforms, and
- Should not introduce measurable throughput or latency overheads on any platform.
This experimental feature will have a broad impact on real-world applications. The code might have inefficiencies, bugs, and unanticipated non-bug behaviors. This feature must therefore be disabled by default and enabled only by explicit user request. We intend to enable it by default in later releases and eventually remove the code for legacy object headers altogether.
Non-Goals
It is not a goal to
- Reduce the object header size below 64 bits on 64-bit platforms,
- Reduce the object header size on non-target 64-bit platforms,
- Change the object header size on 32-bit platforms, since they are already 64 bits, or
- Change the encoding of object content (i.e., fields and array elements) or array metadata (i.e., array length).
Motivation
An object stored in the heap has metadata, which the HotSpot JVM stores in the object's header. The size of the header is constant; it is independent of object type, array shape, and content. In the 64-bit HotSpot JVM, object headers occupy between 96 bits (12 bytes) and 128 bits (16 bytes), depending on how the JVM is configured.
Objects in Java programs tend to be small. Experiments conducted as part of Project Lilliput show that many workloads have average object sizes of 256 to 512 bits (32 to 64 bytes). This implies that more than 20% of live data can be taken by object headers alone. Thus even a small improvement in object header size could yield a significant reduction in footprint, data locality, and reduced GC pressure. Early adopters of Project Lilliput who have tried it with real-world applications confirm that live data is typically reduced by 10%–20%.
Description
Compact object headers is an experimental feature and therefore disabled by default. Compact object headers can be enabled with -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders
.
Current object headers
In the HotSpot JVM, object headers support many different features:
- Garbage collection — Storing forwarding pointers and tracking object ages;
- Type system — Identifying an object's class, which is used for method invocation, reflection, type checks, etc.;
- Locking — Storing information about associated light-weight and heavy-weight locks; and
- Hash codes — Storing an object's stable identity hash code, once computed.
The current object header layout is split into a mark word and a class word. The mark word comes first, has the size of a machine address, and contains:
Mark Word (normal):
64 39 8 3 0
[.......................HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH.AAAA.TT]
(Unused) (Hash Code) (GC Age)(Tag)
In some situations, the mark word is overwritten with a tagged pointer to a separate data structure:
Mark Word (overwritten):
64 2 0
[ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppTT]
(Native Pointer) (Tag)
When this is done, the tag bits describe the type of pointer stored in the header. If necessary, the original mark word is preserved (displaced) in the data structure to which this pointer refers, and the fields of the original header, i.e., the hash code and age bits, are accessed by dereferencing the pointer to get to the displaced header.
The class word comes after the mark word. It takes one of two shapes, depending on whether compressed class pointers are enabled:
Class Word (uncompressed):
64 0
[cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc]
(Class Pointer)
Class Word (compressed):
32 0
[CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC]
(Compressed Class Pointer)
The class word is never overwritten, which means that an object's type information is always available, so no additional steps are required to check a type or invoke a method. Most importantly, the parts of the runtime that need that type information do not have to cooperate with the locking, hashing, and GC subsystems, which can change the mark word.
Compact object headers
For compact object headers, we remove the division between the mark and class words by subsuming the class pointer, in compressed form, into the mark word:
Header (compact):
64 42 11 7 3 0
[CCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHVVVVAAAASTT]
(Compressed Class Pointer) (Hash Code) /(GC Age)^(Tag)
(Valhalla-reserved bits) (Self Forwarded Tag)
Locking operations no longer overwrite the mark word with a tagged pointer, thus preserving the compressed class pointer. GC forwarding operations become more complex in order to preserve direct access to the compressed class pointer, requiring a new tag bit, as discussed below. The size of the hash code does not change. We reserve four bits for future use by Project Valhalla.
Compressed class pointers
Today's compressed class pointers encode a 64-bit pointer into 32 bits. They are enabled by default, but can be disabled via -XX:-UseCompressedClassPointers
. The only reason to disable them, however, would be for an application that loads more than about four million classes; we have yet to see such an application.
Compact object headers require compressed class pointers to be enabled and, moreover, reduce the size of compressed class pointers from 32 bits to 22 bits by changing the compressed class pointer encoding.
Locking
The HotSpot JVM's object-locking subsystem has two levels.
Lightweight locking is used when the locked object's monitor is uncontended, no thread control methods (
wait()
,notify()
, etc.) are called, and no JNI locking is used. In such cases, HotSpot atomically flips the tag bits in the object header from01
(unlocked) to00
(lightweight-locked). No additional data structures are required, and no other header bits are used.Monitor locking is used when the locked object's monitor is contended, thread control methods are used, or lightweight locking is otherwise inadequate. To indicate this state, HotSpot atomically flips the tags bits in the object header from
01
(unlocked) or00
(lightweight-locked) to10
(monitor-locked). Monitor locking creates a new data structure to represent the object's monitor but, as with lightweight locking, does not use any other header bits.
HotSpot also supports the legacy stack-locking mechanism. This spiritual predecessor to lightweight locking associates the locked object with the locking thread by copying the object header to the thread's stack and overwriting the object header with the pointer to the header copy. This is problematic for compact object headers because it overwrites the object header and thus loses crucial type information. Therefore, compact object headers are not compatible with legacy locking. If the JVM is configured to run with both legacy locking and compact object headers then compact object headers are disabled.
GC forwarding
Garbage collectors that relocate objects do so in two steps: First they copy an object and record the mapping between its old and new copies (i.e., forwarding), then they use this mapping to update references to the old copy in either the entire heap or just a particular generation.
Of the current HotSpot GCs, only ZGC uses a separate forwarding table to record forwardings. All other the GCs record forwarding information by overwriting the header of the old copy with the location of the new copy. There are two distinct scenarios that involve headers.
Copying phases copy objects to an empty space. The forwarding pointer to each new copy is stored in the header of the old copy. The original object header is preserved in the new copy. Code that reads the object header from the old copy follows the forwarding pointer to the new copy.
If copying an object to its new location fails, the GCs install a forwarding pointer to the object itself, thus making the object self-forwarded. With compact object headers, this would overwrite the type information. To address this, we indicate that an object is self-forwarded by setting the third bit of the object header rather than by overwriting the entire header.
Sliding phases relocate objects by sliding them down to lower addresses within the same space. This is typically done when heap memory is exhausted and not enough space is left for copying objects. When that happens, a last-ditch effort is made to do a full collection using a sliding collection, which works in four phases:
Mark — Determine the set of live objects.
Compute addresses — Walk over all live objects and compute their new locations, i.e., where they would be placed one after another. Record those locations as forwardings in the object headers.
Update references — Walk over all live objects and update all object references to point to the new locations.
Copy — Actually copy all live objects to their new locations.
Step 2 destroys the original headers. This is also a problem for the current implementation: If the header is interesting, that is, it has an installed identity hash code, locking information, etc., then we need to preserve it. The current GCs do that by storing these headers in a side table and restoring them after a GC. This works well because there are usually only a few objects with interesting headers. With compact object headers, every object comes with an interesting header because now that header contains the crucial class information. Storing a large number of preserved headers would consume a significant amount of native heap.
To overcome this problem, we use a simple encoding of the forwarding pointer which can address up to 8TB of heap in the lower 42 bits of the object header. Compact object headers are currently not compatible with larger heaps when collectors other than ZGC are used. If the JVM is configured to use a heap larger than 8TB and does not use ZGC then compact object headers are disabled.
GC walking
Garbage collectors frequently walk the heap by scanning objects linearly. This requires determining the size of each object, which requires access to each object's class pointer.
When the class pointer is encoded in the header, some simple arithmetic is required to decode it. The cost of doing this is low compared to the cost of the memory accesses involved in a GC walk. No additional implementation work is needed here since the GCs already access class pointers via a common VM interface.
Alternatives
Continue to maintain 32-bit platforms — The mark and class words in object headers are sized as machine pointers, so headers on 32-bit platforms are already 64 bits. However, the difficulty of maintaining the 32-bit ports, coupled with the industry move from 32-bit environments, makes this alternative impractical in the long term.
Implement 32-bit object headers — With more effort, we could implement 32-bit headers. This would likely involve implementing on-demand side storage for identity hash codes. That is our ultimate goal, but initial explorations show that it will require much more work. This proposal captures an important milestone that brings substantial improvements that we can deliver with low risk as we work further toward 32-bit headers.
Testing
Changing the header layout of Java heap objects touches many HotSpot JVM subsystems: the runtime, all garbage collectors, all just-in-time compilers, the interpreters, the serviceability agent, and the architecture-specific code for all supported platforms. Such massive changes warrant massive testing.
Compact object headers will be tested by:
- Tier 1–4 tests, and possibly more testing tiers by vendors which have them;
- The SPECjvm, SPECjbb, DaCapo, and Renaissance benchmark suites to test both correctness and performance;
- JCStress, to test the new locking implementation; and
- Some real-world workloads.
All of these tests will be executed with the feature turned on and off, with multiple combinations of GCs and JIT compilers, and on several hardware targets.
We will also deliver a new set of tests that measure the size of various objects, e.g., plain objects, primitive type arrays, reference arrays, and their headers.
The ultimate test for performance and correctness will be real-world workloads once this experimental feature is delivered.
Risks and Assumptions
Future runtime features need object header bits — This proposal leaves no spare bits in the header for future features that might need such bits. We mitigate this risk organizationally by discussing object header needs with other major JDK projects, such as Project Valhalla. We mitigate this risk technically by assuming that identity hash codes and compressed class pointers can be shrunk even further to make bits available should future runtime features need them.
Implementation bugs in feature code — The usual risk for an intrusive feature such as this is bugs in the implementation. While issues in the header layout might be visible immediately with most tests, subtleties in the new locking and GC forwarding protocols may expose bugs only rarely. We mitigate this risk with careful reviews by component owners and by running many tests with the feature enabled. This risk does not affect the product so long as the feature remains experimental and off by default.
Implementation bugs in legacy code — We try to avoid changing legacy code paths, but some refactorings necessarily touch shared code. This exposes the risk of bugs even when the feature is disabled. In addition to careful reviews and testing, we mitigate this risk by coding defensively and trying to avoid modifying shared code paths, even if it requires more work in feature code paths.
Performance issues in feature code — The more complex protocols for compact object headers may introduce performance issues when the feature is enabled. We mitigate this risk by running major benchmarks and understanding the feature's impact on their performance. There are performance costs for indirectly accessing the class pointer, using the alternative stack locking scheme, and employing the alternative GC sliding forwarding machinery. This risk does not affect the product so long as the feature remains experimental and off by default.
Performance issues in legacy code — There is a minor risk that refactoring the legacy code paths will affect performance in unexpected ways. We mitigate this risk by minimizing the changes to the legacy code paths and showing that the performance of major workloads is not substantially affected.
Compressed class pointers support — Compressed class pointers are not supported by JVMCI on x64. We mitigate the immediate risk by disabling compact object headers when JVMCI is enabled. The long-term risk is that compact headers are never implemented in JVMCI, which would forever block removing the legacy header implementation. We assign only a minor probability to this risk since other JIT compilers support compact object headers without intrusive changes.
Compressed class pointers encoding — As stated above, the current implementation of compressed class pointers is limited to about four million classes. Presently, users can work around this limitation by disabling compressed class pointers, but if we remove the legacy header implementation, then that will no longer be possible. We mitigate the immediate risk by providing compact object headers as an experimental feature; in the long term, we intend to work toward more efficient compressed class pointer encoding schemes.
Changing low-level interfaces — Some components that manipulate object headers directly, notably the Graal compiler as the major user of JVMCI, will have to implement the new header layout. We mitigate the current risk by identifying these components and disabling the feature when those components are in use. Before the feature graduates from experimental status, those components will need to be upgraded.
Soft project failure — There is a minor risk that the feature has irreconcilable functional regressions compared to the legacy implementation, e.g., limiting the number of representable classes. A related risk is that while the feature provides significant performance improvements on its own, it comes with significant functional limitations, which might lead to an argument for keeping both the new and legacy header implementations forever. Given that the goal of this work is to replace the legacy header implementation eventually, we consider this a soft project failure. We mitigate this risk by carefully examining current limitations, planning future work to eliminate them, and looking to early adopters reports to identify other risks before we invest too much effort.
Hard project failure — While very unlikely, it may turn out that compact object headers do not yield tangible real-world improvements or that the achievable improvements do not justify their additional complexity. We mitigate this minor risk by gating the new code paths as experimental, thus keeping a path open to removing the feature in a future release should the need arise.
- duplicates
-
JDK-8198331 [Lilliput] Remove mark word from objects
- Closed
-
JDK-8198332 [Lilliput] Remove klass word from objects
- Closed
- relates to
-
JDK-8334299 Deprecate LockingMode option, along with LM_LEGACY and LM_MONITOR
- Resolved
-
JDK-8305895 Implement JEP 450: Compact Object Headers (Experimental)
- Resolved
-
JDK-8334496 Deprecate LockingMode option, along with LM_LEGACY and LM_MONITOR
- Closed