Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 17
Affects Version/s: None
Component/s: hotspot
Labels:
- zgc

Subcomponent:
gc
Resolved In Build:
b26

ZGC use ConcGCThreads GC threads in each GC cycle for concurrent operations, such as marking, non-ref processing, relocation, etc. Currently, the default value for ConcGCThreads is ~12.5% of total #CPU. Such relatively conservative default value is used to avoid "stealing" too much CPU from mutators, which is important especially for latency sensitive benchmarks.

However, for some benchmarks (or some phases of benchmarks), 12.5% is not enough for GC to keep up with mutators; allocation stalls are observed. Naively increasing the default value (to 25% for instance) solves the allocation stall problem, but also causes large regression on some latency sensitive benchmarks (observed in our testing). Therefore, instead of using a static number of GC threads for every GC cycle, we dynamically select #GC threads used in a GC cycle, which permits fewer GC threads for good latency and more GC threads to match higher allocation rate.

With this change, the default ConcGCThreads will be 25% of total #CPU, and we select #GC threads (a value in the range of [1, ConcGCThreads]) to use for each GC cycle, based on various metrics (GC cycle duration, free space left, allocation rate, etc).

Such feature is enabled by default, and can be turned off using `-XX:-UseDynamicNumberOfGCThreads`.

--------------------------------------------------
# Implementation Overview

dynamic-gc needs to decide #workers to use and when to initiate a gc cycle. We will first cover some basic metrics utilized.

## Metrics

### 1. gc duration

A gc cycle mostly consists of parallel phases (using multiple gc threads) connected with short serial phaes (using a single thread). We track these two parts (`per_worker_time` in parallel phases and `serial_time`) separately to better model how gc duration reacts to the change of #threads used.

### 2. allocation rate

Periodically sample #bytes allocated by mutators; keep a history of certain number of samples, from while constructing the average and standard deviation. Then use them to as an estimate for future allocation rate.

alloc_rate = avg + sd * sd_factor, where sd_factor is ~3.3

### 3. time_till_oom

Derived based on #free_bytes and `alloc_rate`.

## Main Algorithm

The algorithm assumes that for two consecutive gc cycles, the total gc cpu time should be similar; IOW, serial_time + per_worker_time * #workers.

Therefor, the number of workers we should be using, if a gc cycle is started right now, to avoid OOM should be at least:

#workers = ceit((time_till_oom - serial_time) / per_work_time)

Then we use #workers to calculate the actual gc duration and check if we indeed need to start a gc. A gc cycle is initiated when `gc_duration >= time_till_oom`, where `gc_duration = serial_time + per_work_time * #workers`.

relates to

JDK-8271064 ZGC several jvm08 perf regressions after JDK-8268372

Closed

links to

Commit openjdk/jdk/dd34a4c2

Review openjdk/jdk/4410

Assignee:: Albert Yang

Reporter:: Albert Yang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2021-06-08 01:34

Updated:: 2025-01-29 08:42

Resolved:: 2021-06-09 03:38

Details

Description

Attachments

Issue Links

Activity

People

Dates