Loading...

XML

Word

Printable

Type: JEP
Resolution: Delivered
Priority: P2
Fix Version/s: 9
Component/s: core-libs
Labels:

Author:
Brent Christian
JEP Type:
Feature
Exposure:
Open
Subcomponent:
java.lang
Scope:
Implementation
Discussion:
core dash libs dash dev at openjdk dot java dot net
Effort:
L
Duration:
XL
JEP Number:
254

Summary

Adopt a more space-efficient internal representation for strings.

Goals

Improve the space efficiency of the String class and related classes while maintaining performance in most scenarios and preserving full compatibility for all related Java and native interfaces.

Non-Goals

It is not a goal to use alternate encodings such as UTF-8 in the internal representation of strings. A subsequent JEP may explore that approach.

Motivation

The current implementation of the String class stores characters in a char array, using two bytes (sixteen bits) for each character. Data gathered from many different applications indicates that strings are a major component of heap usage and, moreover, that most String objects contain only Latin-1 characters. Such characters require only one byte of storage, hence half of the space in the internal char arrays of such String objects is going unused.

Description

We propose to change the internal representation of the String class from a UTF-16 char array to a byte array plus an encoding-flag field. The new String class will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string. The encoding flag will indicate which encoding is used.

String-related classes such as AbstractStringBuilder, StringBuilder, and StringBuffer will be updated to use the same representation, as will the HotSpot VM's intrinsic string operations.

This is purely an implementation change, with no changes to existing public interfaces. There are no plans to add any new public APIs or other interfaces.

The prototyping work done to date confirms the expected reduction in memory footprint, substantial reductions of GC activity, and minor performance regressions in some corner cases.

For further detail, see:

Alternatives

We tried a "compressed strings" feature in JDK 6 update releases, enabled by an -XX flag. When enabled, String.value was changed to an Object reference and would point either to a byte array, for strings containing only 7-bit US-ASCII characters, or else a char array. This implementation was not open-sourced, so it was difficult to maintain and keep in sync with the mainline JDK source. It has since been removed.

Testing

Thorough compatibility and regression testing will be essential for a change to such a fundamental part of the platform.

We will also need to confirm that we have fulfilled the performance goals of this project. Analysis of memory savings will need to be done. Performance testing should be done using a broad range of workloads, ranging from focused microbenchmarks to large-scale server workloads.

We will encourage the entire Java community to perform early testing with this change in order to identify any remaining issues.

Risks and Assumptions

Optimizing character storage for memory may well come with a trade-off in terms of run-time performance. We expect that this will be offset by reduced GC activity and that we will be able to maintain the throughput of typical server benchmarks. If not, we will investigate optimizations that can strike an acceptable balance between memory saving and run-time performance.

Other recent projects have already reduced the heap space used by strings, in particular JEP 192: String Deduplication in G1. Even with duplicates eliminated, the remaining string data can be made to consume less space if encoded more efficiently. We are assuming that this project will still provide a benefit commensurate with the effort required.

is blocked by

JDK-8064810 JEP-JDK-8054307: Performance plan for More memory-efficient internal representation for Strings

Resolved

relates to

JDK-8146547 String copy intrinsics should zero array in case of tightly coupled allocation

Open

JDK-8155608 String intrinsic range checks are not strict enough

Resolved

JDK-8196995 java.lang.Character should not state UTF-16 encoding is used for strings

Closed

JDK-8162716 Doc tasks for JEP 254: Compact Strings

Resolved

JDK-8134758 Final String field values should be trusted as stable

Resolved

JDK-8144693 Intrinsify StringCoding.hasNegatives() on SPARC

Resolved

JDK-8363925 Remove unused sun.nio.cs.ArrayEncoder::encode

Resolved

JDK-8046182 JEP 192: String Deduplication in G1

Closed

JDK-8279833 Loop optimization issue in String.encodeUTF8_UTF16

Resolved

JDK-8143553 StringBuffer.getByte(byte[], int, byte) should be package private (not protected)

Resolved

JDK-8144212 JDK 9 b93 breaks Apache Lucene due to compact strings

Resolved

JDK-8143219 AArch64 broken by 8141132: JEP 254: Compact Strings

Resolved

JDK-8140390 Char stores/loads accessing byte arrays must be marked as unmatched

Closed

JDK-8141443 jdk/test/java/util/regex/RegExTest.java fails: No match found

Closed

JDK-8142303 C2 compilation fails with "bad AD file"

Closed

JDK-8164612 NoSuchMethodException when method name contains NULL or Latin-1 supplement character

Closed

JDK-8144691 JEP 254: Compact Strings: endiannes mismatch in Java source code and intrinsic

Closed

JDK-6826329 (str) Fastpath for new String(bytes..) and String#getBytes(..) for ASCII + ISO-8859-1

Open

JDK-8173585 Intrinsify StringLatin1.indexOf(char)

Resolved

JDK-8184943 AARCH64: Intrinsify hasNegatives

Resolved

JDK-6941938 Improve array equals intrinsic on SPARC

Resolved

JDK-8231717 Improve performance of charset decoding when charset is always compactable

Resolved

JDK-8059092 JEP 250: Store Interned Strings in CDS Archives

Closed

JDK-8085796 JEP 280: Indify String Concatenation

Closed

JDK-8139132 CompactStrings intrinsics should use ArrayCopyNode

Closed

JDK-8156861 AArch64: JEP 254: Partially-implemented intrinsics

Closed

(22 relates to)

1.	Basic string intrinsics for x86	Resolved	Tobias Hartmann
2.	Adapt C2's string concatenation optimization	Resolved	Tobias Hartmann
3.	Basic string intrinsics for Sparc	Resolved	Tobias Hartmann
4.	Improve performance of string compression on Sparc	Resolved	Tobias Hartmann
5.	Improve performance of string inflation on Sparc	Resolved	Tobias Hartmann
6.	String.coder should be final	Resolved	Aleksey Shipilev
7.	Figure out the best code shape for a kill switch	Resolved	Aleksey Shipilev
8.	StringCoding need to be update/optimized for compact string implementation	Resolved	Xueming Shen
9.	Investigate performance regressions on Sparc	Resolved	Tobias Hartmann
10.	C1 and C2 intrinsics for StringUTF16.(get\|set)Char	Resolved	Tobias Hartmann
11.	String.charAt blows the MaxInlineSize limit, penalizes C1	Closed	Unassigned
12.	StringUTF16.(get\|set)Char intrinsic should use scaled operand	Resolved	Aleksey Shipilev
13.	StringUTF16 should check for the maximum length	Resolved	Xueming Shen
14.	CompactStrings flag handling without extending the JVM interface	Resolved	Aleksey Shipilev
15.	Replace common copying loops with arraycopy/copyOf/copyOfRange	Closed	Aleksey Shipilev
16.	Backout runtime checks in intrinsics before integration	Resolved	Tobias Hartmann
17.	Remove StringCharIntrinsics flag after JDK-8138651 is fixed	Resolved	Aleksey Shipilev
18.	Integration	Resolved	Tobias Hartmann
19.	Release Note: JEP 254: Compact Strings	Closed	Xueming Shen

Assignee:: Xueming Shen
Reporter:: Brent Christian
Owner:: Xueming Shen
Reviewed By:: Aleksey Shipilev, Brian Goetz, Charlie Hunt (Inactive)
Endorsed By:: Brian Goetz
Votes:: 0 Vote for this issue
Watchers:: 25 Start watching this issue

Due:: 2015-12-02
Created:: 2014-08-04 14:54
Updated:: 2025-07-23 03:59
Resolved:: 2016-04-06 19:34
Integration Due:: 2015-11-25

Details

Description

Summary

Goals

Non-Goals

Motivation

Description

Alternatives

Testing

Risks and Assumptions

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates