-
Enhancement
-
Resolution: Unresolved
-
P4
-
8, 9
-
generic
-
linux
In PSR CRM Fuse, we saw that scanning the class loader data graph may take a long time, in the range of 20ms where the total pause time is 25ms on average. This represents a scalability bottleneck, as this causes long waits (high termination time) for all other threads to complete the "parallel" phase of the GC.
The problem is that scanning the CLDG for roots is currently a single work item that is assigned to a single thread during ext root scan/the parallel phase of the G1 GC.
This issue can also be reproduced in GCBench, e.g.
Configuration -Xms4g -Xmx4g -XX:MaxMetaspaceSize=256m and -DGCBench.fillnonheap=20 (load the classes up to 20% of metaspace is full) shows imbalanced 'Ext Root Scan' time. The Avg is less than 2ms, but Max is 8.5ms. The following is a snip from log level=finest log from a build with additional log output:
[SH_PS_SystemDictionary_oops_do: 0.0 0.0 1.7 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.2, Max: 1.7, Diff: 1.7, Sum: 1.7]
[SH_PS_ClassLoaderDataGraph_oops_do: 0.0 0.0 0.0 0.0 0.0 10.7 0.0 0.0
Min: 0.0, Avg: 1.3, Max: 10.7, Diff: 10.7, Sum: 10.7]
The problem is that scanning the CLDG for roots is currently a single work item that is assigned to a single thread during ext root scan/the parallel phase of the G1 GC.
This issue can also be reproduced in GCBench, e.g.
Configuration -Xms4g -Xmx4g -XX:MaxMetaspaceSize=256m and -DGCBench.fillnonheap=20 (load the classes up to 20% of metaspace is full) shows imbalanced 'Ext Root Scan' time. The Avg is less than 2ms, but Max is 8.5ms. The following is a snip from log level=finest log from a build with additional log output:
[SH_PS_SystemDictionary_oops_do: 0.0 0.0 1.7 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.2, Max: 1.7, Diff: 1.7, Sum: 1.7]
[SH_PS_ClassLoaderDataGraph_oops_do: 0.0 0.0 0.0 0.0 0.0 10.7 0.0 0.0
Min: 0.0, Avg: 1.3, Max: 10.7, Diff: 10.7, Sum: 10.7]