-
Enhancement
-
Resolution: Unresolved
-
P4
-
None
This is a long-standing known problem, but maybe bringing it up again might give it some additional attention:
In some recent SPECjbb2015 runs, during the hbIR phase (but the issue is generic enough to possibly reproduce everywhere), when running with just a max heap size, but THP and AlwaysPreTouch, it can happen that some pauses show excessive unusual durations, e.g.
[310.661s][gc ] GC(161) Pause Young (Mixed) (G1 Evacuation Pause) 6793M->1851M(8064M) 59.162ms
[310.661s][gc,cpu ] GC(161) User=1.85s Sys=0.01s Real=0.05s
[311.324s][gc,start ] GC(162) Pause Young (Concurrent Start) (G1 Evacuation Pause)
[311.324s][gc,task ] GC(162) Using 32 workers of 32 for evacuation
[311.784s][gc,mmu ] GC(162) MMU target violated: 201.0ms (200.0ms/201.0ms)
[311.784s][gc,phases ] GC(162) Pre Evacuate Collection Set: 0.44ms
[311.784s][gc,phases ] GC(162) Merge Heap Roots: 0.91ms
[311.784s][gc,phases ] GC(162) Evacuate Collection Set: 42.19ms
[311.784s][gc,phases ] GC(162) Post Evacuate Collection Set: 416.58ms
[311.784s][gc,phases ] GC(162) Other: 0.18ms
[311.784s][gc,heap ] GC(162) Eden regions: 282->0(23)
[311.784s][gc,heap ] GC(162) Survivor regions: 20->28(38)
[311.785s][gc,heap ] GC(162) Old regions: 88->88
[311.785s][gc,heap ] GC(162) Humongous regions: 11->11
[311.785s][gc,metaspace ] GC(162) Metaspace: 28548K(28864K)->28548K(28864K) NonClass: 26148K(26304K)->26148K(26304K) Class: 2400K(2560K)->2400K(2560K)
[311.785s][gc ] GC(162) Pause Young (Concurrent Start) (G1 Evacuation Pause) 6363M->1973M(16320M) 460.447ms
[311.785s][gc ] GC(163) Concurrent Mark Cycle
[311.785s][gc,marking ] GC(163) Concurrent Scan Root Regions
[311.785s][gc,cpu ] GC(162) User=1.22s Sys=12.74s Real=0.46s
Notice the
- high sys time (12.74s)
- expansion of the heap during that GC (8064M->16320M)
- high Post Evacuate Collection Set time (416.58ms)
which all indicates a problem with THP and/or AlwaysPreTouch.
Note that in this case the immediate heap expansion in the pause might not have even been necessary: heap used after GC is 1974M, previous committed is 8064M). I.e., maybe the expansion could have been deferred to outside the pause in this case.
(And there is another question why G1 would expand from 2g heap used, having 8g committed, to 16g in the first place, but the logs are insufficient here)
In some recent SPECjbb2015 runs, during the hbIR phase (but the issue is generic enough to possibly reproduce everywhere), when running with just a max heap size, but THP and AlwaysPreTouch, it can happen that some pauses show excessive unusual durations, e.g.
[310.661s][gc ] GC(161) Pause Young (Mixed) (G1 Evacuation Pause) 6793M->1851M(8064M) 59.162ms
[310.661s][gc,cpu ] GC(161) User=1.85s Sys=0.01s Real=0.05s
[311.324s][gc,start ] GC(162) Pause Young (Concurrent Start) (G1 Evacuation Pause)
[311.324s][gc,task ] GC(162) Using 32 workers of 32 for evacuation
[311.784s][gc,mmu ] GC(162) MMU target violated: 201.0ms (200.0ms/201.0ms)
[311.784s][gc,phases ] GC(162) Pre Evacuate Collection Set: 0.44ms
[311.784s][gc,phases ] GC(162) Merge Heap Roots: 0.91ms
[311.784s][gc,phases ] GC(162) Evacuate Collection Set: 42.19ms
[311.784s][gc,phases ] GC(162) Post Evacuate Collection Set: 416.58ms
[311.784s][gc,phases ] GC(162) Other: 0.18ms
[311.784s][gc,heap ] GC(162) Eden regions: 282->0(23)
[311.784s][gc,heap ] GC(162) Survivor regions: 20->28(38)
[311.785s][gc,heap ] GC(162) Old regions: 88->88
[311.785s][gc,heap ] GC(162) Humongous regions: 11->11
[311.785s][gc,metaspace ] GC(162) Metaspace: 28548K(28864K)->28548K(28864K) NonClass: 26148K(26304K)->26148K(26304K) Class: 2400K(2560K)->2400K(2560K)
[311.785s][gc ] GC(162) Pause Young (Concurrent Start) (G1 Evacuation Pause) 6363M->1973M(16320M) 460.447ms
[311.785s][gc ] GC(163) Concurrent Mark Cycle
[311.785s][gc,marking ] GC(163) Concurrent Scan Root Regions
[311.785s][gc,cpu ] GC(162) User=1.22s Sys=12.74s Real=0.46s
Notice the
- high sys time (12.74s)
- expansion of the heap during that GC (8064M->16320M)
- high Post Evacuate Collection Set time (416.58ms)
which all indicates a problem with THP and/or AlwaysPreTouch.
Note that in this case the immediate heap expansion in the pause might not have even been necessary: heap used after GC is 1974M, previous committed is 8064M). I.e., maybe the expansion could have been deferred to outside the pause in this case.
(And there is another question why G1 would expand from 2g heap used, having 8g committed, to 16g in the first place, but the logs are insufficient here)