-
Enhancement
-
Resolution: Fixed
-
P4
-
None
-
None
-
generic
-
generic
The current gc logging reports the following data for concurrent marking remembered set scan with 12 concurrent GC threads with an Extremem workload with 54G heap, 6GB old-gen, 16MB region size, about 3.5 GB/s allocation rate:
```
[18.494s][info][gc,stats ] Concurrent Scan Remembered Set 125 us, parallelism: 5.91x
[38.948s][info][gc,stats ] Concurrent Scan Remembered Set 134 us, parallelism: 6.84x
[64.500s][info][gc,stats ] Concurrent Scan Remembered Set 51138 us, parallelism: 9.26x
[85.000s][info][gc,stats ] Concurrent Scan Remembered Set 155557 us, parallelism: 11.07x
[93.297s][info][gc,stats ] Concurrent Scan Remembered Set 191,065 us, parallelism: 10.68x
[127.132s][info][gc,stats ] Concurrent Scan Remembered Set 3,833,052 us, parallelism: 2.62x
[157.342s][info][gc,stats ] Concurrent Scan Remembered Set 4,134,449 us, parallelism: 1.49x
[183.292s][info][gc,stats ] Concurrent Scan Remembered Set 3819210 us, parallelism: 4.16x
[206.930s][info][gc,stats ] Concurrent Scan Remembered Set 4009429 us, parallelism: 3.40x
[228.736s][info][gc,stats ] Concurrent Scan Remembered Set 4401983 us, parallelism: 3.11x
[252.832s][info][gc,stats ] Concurrent Scan Remembered Set 4232867 us, parallelism: 3.20x
[266.107s][info][gc,stats ] Concurrent Scan Remembered Set 4906311 us, parallelism: 2.95x
[279.401s][info][gc,stats ] Concurrent Scan Remembered Set 4851601 us, parallelism: 3.09x
[292.902s][info][gc,stats ] Concurrent Scan Remembered Set 4889204 us, parallelism: 3.21x
[306.222s][info][gc,stats ] Concurrent Scan Remembered Set 4835401 us, parallelism: 3.09x
[322.358s][info][gc,stats ] Concurrent Scan Remembered Set 4768106 us, parallelism: 3.31x
[342.780s][info][gc,stats ] Concurrent Scan Remembered Set 4496549 us, parallelism: 3.48x
[363.408s][info][gc,stats ] Concurrent Scan Remembered Set 4448616 us, parallelism: 3.52x
[383.974s][info][gc,stats ] Concurrent Scan Remembered Set 4658215 us, parallelism: 3.57x
[404.554s][info][gc,stats ] Concurrent Scan Remembered Set 4439153 us, parallelism: 3.48x
[425.220s][info][gc,stats ] Concurrent Scan Remembered Set 4590934 us, parallelism: 3.52x
[445.960s][info][gc,stats ] Concurrent Scan Remembered Set 4442831 us, parallelism: 3.75x
[466.552s][info][gc,stats ] Concurrent Scan Remembered Set 4403618 us, parallelism: 3.54x
[486.973s][info][gc,stats ] Concurrent Scan Remembered Set 4400160 us, parallelism: 3.51x
[507.630s][info][gc,stats ] Concurrent Scan Remembered Set 4614590 us, parallelism: 3.63x
[528.228s][info][gc,stats ] Concurrent Scan Remembered Set 4527226 us, parallelism: 3.64x
[548.736s][info][gc,stats ] Concurrent Scan Remembered Set 4402859 us, parallelism: 3.62x
[569.280s][info][gc,stats ] Concurrent Scan Remembered Set 4452668 us, parallelism: 3.53x
[589.838s][info][gc,stats ] Concurrent Scan Remembered Set 4581006 us, parallelism: 3.63x
[610.353s][info][gc,stats ] Concurrent Scan Remembered Set 4283345 us, parallelism: 3.68x
[631.006s][info][gc,stats ] Concurrent Scan Remembered Set 4572265 us, parallelism: 3.58x
[651.421s][info][gc,stats ] Concurrent Scan Remembered Set 4413763 us, parallelism: 3.75x
[671.999s][info][gc,stats ] Concurrent Scan Remembered Set 4518760 us, parallelism: 3.52x
[692.388s][info][gc,stats ] Concurrent Scan Remembered Set 4560564 us, parallelism: 3.55x
[713.063s][info][gc,stats ] Concurrent Scan Remembered Set 4500967 us, parallelism: 3.83x
[733.542s][info][gc,stats ] Concurrent Scan Remembered Set 4532324 us, parallelism: 3.62x
[754.078s][info][gc,stats ] Concurrent Scan Remembered Set 4552902 us, parallelism: 3.60x
[774.559s][info][gc,stats ] Concurrent Scan Remembered Set 4498620 us, parallelism: 3.43x
[795.135s][info][gc,stats ] Concurrent Scan Remembered Set 4476302 us, parallelism: 3.57x
[815.647s][info][gc,stats ] Concurrent Scan Remembered Set 4384415 us, parallelism: 3.73x
[835.955s][info][gc,stats ] Concurrent Scan Remembered Set 4576489 us, parallelism: 3.67x
[856.553s][info][gc,stats ] Concurrent Scan Remembered Set 4320136 us, parallelism: 3.58x
[877.296s][info][gc,stats ] Concurrent Scan Remembered Set 4469043 us, parallelism: 3.72x
[897.805s][info][gc,stats ] Concurrent Scan Remembered Set 4576364 us, parallelism: 3.56x
[918.657s][info][gc,stats ] Concurrent Scan Remembered Set 4491751 us, parallelism: 3.72x
[939.294s][info][gc,stats ] Concurrent Scan Remembered Set 4347367 us, parallelism: 3.47x
[960.017s][info][gc,stats ] Concurrent Scan Remembered Set 4520656 us, parallelism: 3.63x
[980.735s][info][gc,stats ] Concurrent Scan Remembered Set 4558384 us, parallelism: 3.66x
[1001.302s][info][gc,stats ] Concurrent Scan Remembered Set 4644784 us, parallelism: 3.70x
[1021.741s][info][gc,stats ] Concurrent Scan Remembered Set 4428444 us, parallelism: 3.69x
[1042.174s][info][gc,stats ] Concurrent Scan Remembered Set 4557124 us, parallelism: 3.56x
[1062.694s][info][gc,stats ] Concurrent Scan Remembered Set 4558656 us, parallelism: 3.61x
[1083.133s][info][gc,stats ] Concurrent Scan Remembered Set 4389198 us, parallelism: 3.63x
[1103.690s][info][gc,stats ] Concurrent Scan Remembered Set 4483523 us, parallelism: 3.52x
[1124.271s][info][gc,stats ] Concurrent Scan Remembered Set 4507127 us, parallelism: 3.62x
[1144.644s][info][gc,stats ] Concurrent Scan Remembered Set 4569821 us, parallelism: 3.52x
[1165.096s][info][gc,stats ] Concurrent Scan Remembered Set 4582929 us, parallelism: 3.52x
[1185.806s][info][gc,stats ] Concurrent Scan Remembered Set 4600048 us, parallelism: 3.62x
[1206.335s][info][gc,stats ] Concurrent Scan Remembered Set 4442244 us, parallelism: 3.52x
[1226.950s][info][gc,stats ] Concurrent Scan Remembered Set 4419254 us, parallelism: 3.43x
[1247.517s][info][gc,stats ] Concurrent Scan Remembered Set 4627612 us, parallelism: 3.62x
[1268.102s][info][gc,stats ] Concurrent Scan Remembered Set 4344430 us, parallelism: 3.70x
[1288.764s][info][gc,stats ] Concurrent Scan Remembered Set 4588627 us, parallelism: 3.51x
[1309.281s][info][gc,stats ] Concurrent Scan Remembered Set 4504287 us, parallelism: 3.67x
[1329.873s][info][gc,stats ] Concurrent Scan Remembered Set 4555205 us, parallelism: 3.59x
[1343.234s][info][gc,stats ] Concurrent Scan Remembered Set 4526610 us, parallelism: 3.61x
```
Zoom in on one of these:
```
[1288.764s][info][gc,stats ] Concurrent Scan Remembered Set 4588627 us, parallelism: 3.51x
[1288.764s][info][gc,stats ] RS: <total> 16122507 us
[1288.764s][info][gc,stats ] RS: Scan Clusters 16122507 us, workers (us): 4588502, 995339, 960692, 966753, 96268\
2, 955170, 940895, 948608, 942101, 946239, 987230, 1928297,
```
The problem with this lack of parallelism is that the concurrent time to perform remembered set scanning is 4.5 seconds instead of less than 1 second. During the extra 3.5 seconds, while the application is allocating at 3.5 GB/s, we exhaust the available allocation pool and trigger stop-the-world degenerated or full GC.
One reason for the poor load balancing is because the minimum unit of labor between concurrent GC threads is one entire heap region. If one core is working on a region that happens to have a large number of "interesting pointers" after other cores have completed the scanning of their regions with small numbers of interesting pointers, the other cores will sit idle.
This plan for this performance enhancement is to:
1. Add support for scanning of subregions.
2. Assign each worker thread a certain number of full regions to scan
3. As these assignments are completed (at different times by different workers because each region has different density of interesting pointers), make the next N assignments for half-regions.
4. Then make assignments for quarter regions.
5. And so on, until we reach a desired minimum assignment size.
6. As part of this effort, when scanning humongous arrays of reference, skip over memory ranges with cards that are not DIRTY.
```
[18.494s][info][gc,stats ] Concurrent Scan Remembered Set 125 us, parallelism: 5.91x
[38.948s][info][gc,stats ] Concurrent Scan Remembered Set 134 us, parallelism: 6.84x
[64.500s][info][gc,stats ] Concurrent Scan Remembered Set 51138 us, parallelism: 9.26x
[85.000s][info][gc,stats ] Concurrent Scan Remembered Set 155557 us, parallelism: 11.07x
[93.297s][info][gc,stats ] Concurrent Scan Remembered Set 191,065 us, parallelism: 10.68x
[127.132s][info][gc,stats ] Concurrent Scan Remembered Set 3,833,052 us, parallelism: 2.62x
[157.342s][info][gc,stats ] Concurrent Scan Remembered Set 4,134,449 us, parallelism: 1.49x
[183.292s][info][gc,stats ] Concurrent Scan Remembered Set 3819210 us, parallelism: 4.16x
[206.930s][info][gc,stats ] Concurrent Scan Remembered Set 4009429 us, parallelism: 3.40x
[228.736s][info][gc,stats ] Concurrent Scan Remembered Set 4401983 us, parallelism: 3.11x
[252.832s][info][gc,stats ] Concurrent Scan Remembered Set 4232867 us, parallelism: 3.20x
[266.107s][info][gc,stats ] Concurrent Scan Remembered Set 4906311 us, parallelism: 2.95x
[279.401s][info][gc,stats ] Concurrent Scan Remembered Set 4851601 us, parallelism: 3.09x
[292.902s][info][gc,stats ] Concurrent Scan Remembered Set 4889204 us, parallelism: 3.21x
[306.222s][info][gc,stats ] Concurrent Scan Remembered Set 4835401 us, parallelism: 3.09x
[322.358s][info][gc,stats ] Concurrent Scan Remembered Set 4768106 us, parallelism: 3.31x
[342.780s][info][gc,stats ] Concurrent Scan Remembered Set 4496549 us, parallelism: 3.48x
[363.408s][info][gc,stats ] Concurrent Scan Remembered Set 4448616 us, parallelism: 3.52x
[383.974s][info][gc,stats ] Concurrent Scan Remembered Set 4658215 us, parallelism: 3.57x
[404.554s][info][gc,stats ] Concurrent Scan Remembered Set 4439153 us, parallelism: 3.48x
[425.220s][info][gc,stats ] Concurrent Scan Remembered Set 4590934 us, parallelism: 3.52x
[445.960s][info][gc,stats ] Concurrent Scan Remembered Set 4442831 us, parallelism: 3.75x
[466.552s][info][gc,stats ] Concurrent Scan Remembered Set 4403618 us, parallelism: 3.54x
[486.973s][info][gc,stats ] Concurrent Scan Remembered Set 4400160 us, parallelism: 3.51x
[507.630s][info][gc,stats ] Concurrent Scan Remembered Set 4614590 us, parallelism: 3.63x
[528.228s][info][gc,stats ] Concurrent Scan Remembered Set 4527226 us, parallelism: 3.64x
[548.736s][info][gc,stats ] Concurrent Scan Remembered Set 4402859 us, parallelism: 3.62x
[569.280s][info][gc,stats ] Concurrent Scan Remembered Set 4452668 us, parallelism: 3.53x
[589.838s][info][gc,stats ] Concurrent Scan Remembered Set 4581006 us, parallelism: 3.63x
[610.353s][info][gc,stats ] Concurrent Scan Remembered Set 4283345 us, parallelism: 3.68x
[631.006s][info][gc,stats ] Concurrent Scan Remembered Set 4572265 us, parallelism: 3.58x
[651.421s][info][gc,stats ] Concurrent Scan Remembered Set 4413763 us, parallelism: 3.75x
[671.999s][info][gc,stats ] Concurrent Scan Remembered Set 4518760 us, parallelism: 3.52x
[692.388s][info][gc,stats ] Concurrent Scan Remembered Set 4560564 us, parallelism: 3.55x
[713.063s][info][gc,stats ] Concurrent Scan Remembered Set 4500967 us, parallelism: 3.83x
[733.542s][info][gc,stats ] Concurrent Scan Remembered Set 4532324 us, parallelism: 3.62x
[754.078s][info][gc,stats ] Concurrent Scan Remembered Set 4552902 us, parallelism: 3.60x
[774.559s][info][gc,stats ] Concurrent Scan Remembered Set 4498620 us, parallelism: 3.43x
[795.135s][info][gc,stats ] Concurrent Scan Remembered Set 4476302 us, parallelism: 3.57x
[815.647s][info][gc,stats ] Concurrent Scan Remembered Set 4384415 us, parallelism: 3.73x
[835.955s][info][gc,stats ] Concurrent Scan Remembered Set 4576489 us, parallelism: 3.67x
[856.553s][info][gc,stats ] Concurrent Scan Remembered Set 4320136 us, parallelism: 3.58x
[877.296s][info][gc,stats ] Concurrent Scan Remembered Set 4469043 us, parallelism: 3.72x
[897.805s][info][gc,stats ] Concurrent Scan Remembered Set 4576364 us, parallelism: 3.56x
[918.657s][info][gc,stats ] Concurrent Scan Remembered Set 4491751 us, parallelism: 3.72x
[939.294s][info][gc,stats ] Concurrent Scan Remembered Set 4347367 us, parallelism: 3.47x
[960.017s][info][gc,stats ] Concurrent Scan Remembered Set 4520656 us, parallelism: 3.63x
[980.735s][info][gc,stats ] Concurrent Scan Remembered Set 4558384 us, parallelism: 3.66x
[1001.302s][info][gc,stats ] Concurrent Scan Remembered Set 4644784 us, parallelism: 3.70x
[1021.741s][info][gc,stats ] Concurrent Scan Remembered Set 4428444 us, parallelism: 3.69x
[1042.174s][info][gc,stats ] Concurrent Scan Remembered Set 4557124 us, parallelism: 3.56x
[1062.694s][info][gc,stats ] Concurrent Scan Remembered Set 4558656 us, parallelism: 3.61x
[1083.133s][info][gc,stats ] Concurrent Scan Remembered Set 4389198 us, parallelism: 3.63x
[1103.690s][info][gc,stats ] Concurrent Scan Remembered Set 4483523 us, parallelism: 3.52x
[1124.271s][info][gc,stats ] Concurrent Scan Remembered Set 4507127 us, parallelism: 3.62x
[1144.644s][info][gc,stats ] Concurrent Scan Remembered Set 4569821 us, parallelism: 3.52x
[1165.096s][info][gc,stats ] Concurrent Scan Remembered Set 4582929 us, parallelism: 3.52x
[1185.806s][info][gc,stats ] Concurrent Scan Remembered Set 4600048 us, parallelism: 3.62x
[1206.335s][info][gc,stats ] Concurrent Scan Remembered Set 4442244 us, parallelism: 3.52x
[1226.950s][info][gc,stats ] Concurrent Scan Remembered Set 4419254 us, parallelism: 3.43x
[1247.517s][info][gc,stats ] Concurrent Scan Remembered Set 4627612 us, parallelism: 3.62x
[1268.102s][info][gc,stats ] Concurrent Scan Remembered Set 4344430 us, parallelism: 3.70x
[1288.764s][info][gc,stats ] Concurrent Scan Remembered Set 4588627 us, parallelism: 3.51x
[1309.281s][info][gc,stats ] Concurrent Scan Remembered Set 4504287 us, parallelism: 3.67x
[1329.873s][info][gc,stats ] Concurrent Scan Remembered Set 4555205 us, parallelism: 3.59x
[1343.234s][info][gc,stats ] Concurrent Scan Remembered Set 4526610 us, parallelism: 3.61x
```
Zoom in on one of these:
```
[1288.764s][info][gc,stats ] Concurrent Scan Remembered Set 4588627 us, parallelism: 3.51x
[1288.764s][info][gc,stats ] RS: <total> 16122507 us
[1288.764s][info][gc,stats ] RS: Scan Clusters 16122507 us, workers (us): 4588502, 995339, 960692, 966753, 96268\
2, 955170, 940895, 948608, 942101, 946239, 987230, 1928297,
```
The problem with this lack of parallelism is that the concurrent time to perform remembered set scanning is 4.5 seconds instead of less than 1 second. During the extra 3.5 seconds, while the application is allocating at 3.5 GB/s, we exhaust the available allocation pool and trigger stop-the-world degenerated or full GC.
One reason for the poor load balancing is because the minimum unit of labor between concurrent GC threads is one entire heap region. If one core is working on a region that happens to have a large number of "interesting pointers" after other cores have completed the scanning of their regions with small numbers of interesting pointers, the other cores will sit idle.
This plan for this performance enhancement is to:
1. Add support for scanning of subregions.
2. Assign each worker thread a certain number of full regions to scan
3. As these assignments are completed (at different times by different workers because each region has different density of interesting pointers), make the next N assignments for half-regions.
4. Then make assignments for quarter regions.
5. And so on, until we reach a desired minimum assignment size.
6. As part of this effort, when scanning humongous arrays of reference, skip over memory ranges with cards that are not DIRTY.