-
Enhancement
-
Resolution: Unresolved
-
P4
-
8, 11, 17, 21, 22
Seen this in the logs for our application. A reasonable expectation from `-XmxNg -XmsNg -XX:+AlwaysPreTouch` is that the memory is pre-touched once at startup, when heap is initially allocated. But logs show pre-touches still happen thorough application lifetime.
The simplest reproducer is:
```
import java.util.concurrent.ThreadLocalRandom;
public class Retain {
static final int SIZE = 4000;
static final Object[] RETAIN = new Object[50_000];
public static void main(String... args) throws Throwable {
for (int c = 0; c < RETAIN.length; c++) {
RETAIN[c] = new int[SIZE];
}
while (true) {
for (int c = 0; c < RETAIN.length; c++) {
RETAIN[ThreadLocalRandom.current().nextInt(RETAIN.length)] = new int[SIZE];
}
}
}
}
```
```
% build/macosx-aarch64-server-release/images/jdk/bin/java -Xms4224m -Xmx4224m -XX:+AlwaysPreTouch -XX:+UseParallelGC -Xlog:gc -Xlog:gc+heap=debug Retain.java | grep PreTouch
...
[8.451s][debug][gc,heap] GC(307) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 1572864B.
[8.451s][debug][gc,heap] GC(307) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
[8.504s][debug][gc,heap] GC(309) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
[8.504s][debug][gc,heap] GC(309) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
[8.557s][debug][gc,heap] GC(311) Running ParallelGC PreTouch head with 1 workers for 1 work units pre-touching 1048576B.
[8.584s][debug][gc,heap] GC(312) Running ParallelGC PreTouch head with 1 workers for 1 work units pre-touching 524288B.
[8.611s][debug][gc,heap] GC(313) Running ParallelGC PreTouch head with 1 workers for 1 work units pre-touching 1048576B.
[8.639s][debug][gc,heap] GC(314) Running ParallelGC PreTouch head with 1 workers for 1 work units pre-touching 524288B.
[8.707s][debug][gc,heap] GC(317) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
[8.759s][debug][gc,heap] GC(319) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 1048576B.
[8.759s][debug][gc,heap] GC(319) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
[8.812s][debug][gc,heap] GC(321) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
```
As far as I can tell, this happens when we resize spaces. With `-Xlog:gc+ergo=trace`, you can spot this:
```
[1.767s][trace][gc,ergo] GC(5) PSYoungGen::resize_spaces(requested_eden_size: 1107296256, requested_survivor_size: 491782144)
[1.767s][trace][gc,ergo] GC(5) eden: [0x00000007a8000000..0x00000007ea000000) 1107296256
[1.767s][trace][gc,ergo] GC(5) from: [0x00000007f5000000..0x0000000800000000) 184549376
[1.767s][trace][gc,ergo] GC(5) to: [0x00000007ea000000..0x00000007f5000000) 184549376
[1.767s][trace][gc,ergo] GC(5) Eden, to, from:
[1.767s][trace][gc,ergo] GC(5) [eden_start .. eden_end): [0x00000007a8000000 .. 0x00000007c5600000) 492830720
[1.767s][trace][gc,ergo] GC(5) [ to_start .. to_end): [0x00000007c5600000 .. 0x00000007e2b00000) 491782144
[1.767s][trace][gc,ergo] GC(5) [from_start .. from_end): [0x00000007f5000000 .. 0x0000000800000000) 184549376
[1.767s][debug][gc,heap] GC(5) Running ParallelGC PreTouch head with 1 workers for 1 work units pre-touching 491782144B.
```
This seems to be the long-standing behavior in Parallel GC. But it got worse withJDK-8252221, which unconditionally offloads these pre-touches to separate thread, incurring extra latency. JDK-8312023 deals with that regression separately, leaving this issue to figure out whether we can avoid pre-touching on space boundary moves completely.
The simplest reproducer is:
```
import java.util.concurrent.ThreadLocalRandom;
public class Retain {
static final int SIZE = 4000;
static final Object[] RETAIN = new Object[50_000];
public static void main(String... args) throws Throwable {
for (int c = 0; c < RETAIN.length; c++) {
RETAIN[c] = new int[SIZE];
}
while (true) {
for (int c = 0; c < RETAIN.length; c++) {
RETAIN[ThreadLocalRandom.current().nextInt(RETAIN.length)] = new int[SIZE];
}
}
}
}
```
```
% build/macosx-aarch64-server-release/images/jdk/bin/java -Xms4224m -Xmx4224m -XX:+AlwaysPreTouch -XX:+UseParallelGC -Xlog:gc -Xlog:gc+heap=debug Retain.java | grep PreTouch
...
[8.451s][debug][gc,heap] GC(307) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 1572864B.
[8.451s][debug][gc,heap] GC(307) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
[8.504s][debug][gc,heap] GC(309) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
[8.504s][debug][gc,heap] GC(309) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
[8.557s][debug][gc,heap] GC(311) Running ParallelGC PreTouch head with 1 workers for 1 work units pre-touching 1048576B.
[8.584s][debug][gc,heap] GC(312) Running ParallelGC PreTouch head with 1 workers for 1 work units pre-touching 524288B.
[8.611s][debug][gc,heap] GC(313) Running ParallelGC PreTouch head with 1 workers for 1 work units pre-touching 1048576B.
[8.639s][debug][gc,heap] GC(314) Running ParallelGC PreTouch head with 1 workers for 1 work units pre-touching 524288B.
[8.707s][debug][gc,heap] GC(317) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
[8.759s][debug][gc,heap] GC(319) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 1048576B.
[8.759s][debug][gc,heap] GC(319) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
[8.812s][debug][gc,heap] GC(321) Running ParallelGC PreTouch tail with 1 workers for 1 work units pre-touching 524288B.
```
As far as I can tell, this happens when we resize spaces. With `-Xlog:gc+ergo=trace`, you can spot this:
```
[1.767s][trace][gc,ergo] GC(5) PSYoungGen::resize_spaces(requested_eden_size: 1107296256, requested_survivor_size: 491782144)
[1.767s][trace][gc,ergo] GC(5) eden: [0x00000007a8000000..0x00000007ea000000) 1107296256
[1.767s][trace][gc,ergo] GC(5) from: [0x00000007f5000000..0x0000000800000000) 184549376
[1.767s][trace][gc,ergo] GC(5) to: [0x00000007ea000000..0x00000007f5000000) 184549376
[1.767s][trace][gc,ergo] GC(5) Eden, to, from:
[1.767s][trace][gc,ergo] GC(5) [eden_start .. eden_end): [0x00000007a8000000 .. 0x00000007c5600000) 492830720
[1.767s][trace][gc,ergo] GC(5) [ to_start .. to_end): [0x00000007c5600000 .. 0x00000007e2b00000) 491782144
[1.767s][trace][gc,ergo] GC(5) [from_start .. from_end): [0x00000007f5000000 .. 0x0000000800000000) 184549376
[1.767s][debug][gc,heap] GC(5) Running ParallelGC PreTouch head with 1 workers for 1 work units pre-touching 491782144B.
```
This seems to be the long-standing behavior in Parallel GC. But it got worse with
- relates to
-
JDK-8312023 Parallel pretouch should shortcut when only 1 thread is needed
- Open