-
Enhancement
-
Resolution: Duplicate
-
P4
-
None
While investigating throughput gaps for refworkload wls_webapp_atomic JSP_HTTPS_POST_3 runs with jdk8b115, between g1 and parallelgc, I noticed the following:
g1 lacks of parallelism. G1 cpu utilization is about 50%, parallelgc cpu is 60%. Parallelgc has more throughput. One of the reasons could be g1 734,518 objects allocated out side tlabs, while parallelgc has only 19 for 20s jfr duration. Also parallelgc allocate much less TLAB than g1. g1 dev menntioned when g1 allocates new tlab, the memory is not alined properly.
I tried to increase g1 TLAB by -XX:G1HeapRegionSize=8m -XX:TLABSize=4m. That makes cpu utilization to ~57%, throughput to 47161.176. So that helps. But G1 Avg TLABSize is 1.6M. So I tried to increase it further by -XX:G1HeapRegionSize=16m -XX:TLABSize=8m, that did not increase the throughput further.
Here is the summary
gc and parameters avg TLAB size tlab allocations total mem for TLABs obj allocated out size TLAB throughput
parallelgc 4m 11,696 51.67G 19 50187.008
g1 85kb 333,507 27.04GB 734,518 46022.863
g1 -XX:G1HeapRegionSize=4m -XX:TLABSize=4m 1.63m 30,726 49.01G 7,058 47161.176
g1 -XX:G1HeapRegionSize=16m -XX:TLABSize=4m 3.65M 13,828 49.26G 1,967 46556.9
We need to investigate 2 things:
1. g1 new_tlab allocates from AllocRegion memory alignment
2. TLAB size. Currently the upper bound for TLAB is humongous object size, which is 50% of RegionSize. But this might be too strict.
g1 lacks of parallelism. G1 cpu utilization is about 50%, parallelgc cpu is 60%. Parallelgc has more throughput. One of the reasons could be g1 734,518 objects allocated out side tlabs, while parallelgc has only 19 for 20s jfr duration. Also parallelgc allocate much less TLAB than g1. g1 dev menntioned when g1 allocates new tlab, the memory is not alined properly.
I tried to increase g1 TLAB by -XX:G1HeapRegionSize=8m -XX:TLABSize=4m. That makes cpu utilization to ~57%, throughput to 47161.176. So that helps. But G1 Avg TLABSize is 1.6M. So I tried to increase it further by -XX:G1HeapRegionSize=16m -XX:TLABSize=8m, that did not increase the throughput further.
Here is the summary
gc and parameters avg TLAB size tlab allocations total mem for TLABs obj allocated out size TLAB throughput
parallelgc 4m 11,696 51.67G 19 50187.008
g1 85kb 333,507 27.04GB 734,518 46022.863
g1 -XX:G1HeapRegionSize=4m -XX:TLABSize=4m 1.63m 30,726 49.01G 7,058 47161.176
g1 -XX:G1HeapRegionSize=16m -XX:TLABSize=4m 3.65M 13,828 49.26G 1,967 46556.9
We need to investigate 2 things:
1. g1 new_tlab allocates from AllocRegion memory alignment
2. TLAB size. Currently the upper bound for TLAB is humongous object size, which is 50% of RegionSize. But this might be too strict.
- duplicates
-
JDK-8030177 G1: Enable TLAB resizing
-
- Resolved
-
-
JDK-8028252 Make PrintTLAB work in g1
-
- Closed
-