Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Won't Fix
Priority: P4
Fix Version/s: tbd
Affects Version/s: 21
Component/s: hotspot
Labels:
- metaspace

Subcomponent:
runtime

We might store resolved Utf8 strings not as 8-byte Symbol* words but as 4-byte SymbolRef words.

This would require something like a globally reserved area for storing symbol data, of size up to 4 billion times some grain size (such as 16). We do something like this already for compressed oops and/or compressed classes.

It seems likely that all symbols ever used by any one JVM instance will fit into 64 gigabytes, even with some fragmentation overhead.

It may also be the case that moving from an 8-byte to a 4-byte representation for symbols, as stored in metadata, might reduce footprint.

Background:

An unresolved `CONSTANT_Utf8` is represented very compactly as a two-byte `u2` index into a contextually defined constant pool. HotSpot metadata is organized to try to keep this representation where possible.

When a symbol is resolved, it is stored in a C++ pointer to a compactly organized record, a header containing a length and (saturatable) reference count, immediately followed by Utf8 bytes. This is reasonably compact.

(The compactness could possibly be improved in the case of method signatures which repeat class names. Such schemes have been evaluated in the past. They have been difficult to implement. Perhaps something can be done about this in the future. For example, it would be simpler and almost as effectively to store common prefixes of symbols, so each symbol would be broken into two physical parts, one of which was shareable. That is an RFE for a different day.)

For places where we have to store resolved symbols, such as the constant pools themselves, it may be helpful to store them in 4 bytes instead of 8 bytes.

Even in places where, today, we store symbols in unresolved 2-byte indexes (e.g., methods), it may be profitable to expand them to 4-byte resolved references, simply to reduce the dynamic overhead of decoding.

There is probably no reason to use compressed symbol pointers during "live" processing (in a C++ stack frame). The 8-byte type SymbolHandle is the right choice there.

This RFE is tentative, because we already have a good coverage by SymbolHandle for "live" cases and contextually defined u2 indexes for "at rest" cases, with limited use of Symbol* "at rest" in constant pools to link everything together.

However, if we have tables in HotSpot that make heavy use of C++ Symbol* pointers to represent resolved symbols, it may be worth the effort of using compressed symbol references in those tables. The dictionaries proposed in ~~JDK-8301007~~ are an example of such tables. Class loader constraints are another example.

relates to

JDK-8301007 [lworld] Handle mismatches of the preload attribute in the calling convention

Resolved

JDK-8303095 [lworld] migration support via Q-folding linkage rules

Closed

Assignee:: Unassigned

Reporter:: John Rose

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023-02-24 10:34

Updated:: 2024-01-17 11:22

Resolved:: 2024-01-17 11:22

Details

Description

Attachments

Issue Links

Activity

People

Dates