Thread.dump_to_file takes disproportionally much time with increasing of amount of ThreadContainers, even if ThreadContainers are empty.
Here is the reproducer test (see attached) creating empty ThreadContainers and sequentially running JCMD with "Thread.print", "Thread.dump_to_file", "Thread.dump_to_file -format=json". Amount of threads is not changed.
The results are below. The first column, N, is amount of the created thread containers. Other columns are time spent for JCMD run, in milliseconds.
Time spending for "Thread.print" (2nd column) is not changed.
Time spending for "Thread.dump_to_file" (3rd and 4th columns) starts increasing significantly when amount of thread containers is greater than 2000. Increasing amount of thread containers twice increases the time spent x4-x5 times.
$ java -XX:+UseZGC Test
N, Thread.print, Thread.dump_to_file, Thread.dump_to_file -json
1, 352, 292, 190
2, 179, 172, 171
4, 181, 174, 175
8, 179, 176, 175
16, 174, 177, 178
32, 177, 177, 176
64, 178, 173, 191
128, 172, 173, 185
256, 169, 177, 182
512, 183, 194, 183
1024, 175, 205, 200
2048, 178, 246, 252
4096, 181, 550, 596
8192, 180, 2412, 2425
16384, 145, 10651, 10835
32768, 188, 48748, 39925
65536, 197, 262159, 266275
The issue can be easily reproduced with "-XX:+UseZGC" - for some reasons ZGC doesn't collect unused ScheduledThreadPoolExecutor objects while running the test. In case of the default GC, the issue is reproducible, but time is not increasing continuosly, because GC has a chance to collect and free unused objects. But anyway time spent for "Thread.dump_to_file" is much higher than "Thread.print".
$ java Test
N, Thread.print, Thread.dump_to_file, Thread.dump_to_file -json
1, 327, 253, 201
2, 178, 181, 180
4, 179, 180, 193
8, 143, 144, 141
16, 145, 139, 141
32, 146, 175, 173
64, 167, 174, 181
128, 168, 185, 182
256, 177, 186, 189
512, 177, 186, 185
1024, 175, 189, 215
2048, 174, 225, 248
4096, 178, 377, 438
8192, 194, 1696, 1744
16384, 230, 1710, 185
32768, 172, 9808, 9592
65536, 189, 2655, 212
At the first sight it looks like most time is spent for iterating containers in ThreadContainer.collect().
Is it really needed to iterate through thread container to dump container's info to file (even for empty containers)? Changing this behaviour may significantly improve "Thread.dump_to_file" performance.
Doesn't it look strange that increasing amount of thread containers twice increases time x5?
And additional related question, is it correct that ZGC doesn't collect 60_000 unused ScheduledThreadPoolExecutor objects? This can be "fixed" by using ZCollectionInterval option, however I'm wondering if it is OK for ZGC.
Here is the reproducer test (see attached) creating empty ThreadContainers and sequentially running JCMD with "Thread.print", "Thread.dump_to_file", "Thread.dump_to_file -format=json". Amount of threads is not changed.
The results are below. The first column, N, is amount of the created thread containers. Other columns are time spent for JCMD run, in milliseconds.
Time spending for "Thread.print" (2nd column) is not changed.
Time spending for "Thread.dump_to_file" (3rd and 4th columns) starts increasing significantly when amount of thread containers is greater than 2000. Increasing amount of thread containers twice increases the time spent x4-x5 times.
$ java -XX:+UseZGC Test
N, Thread.print, Thread.dump_to_file, Thread.dump_to_file -json
1, 352, 292, 190
2, 179, 172, 171
4, 181, 174, 175
8, 179, 176, 175
16, 174, 177, 178
32, 177, 177, 176
64, 178, 173, 191
128, 172, 173, 185
256, 169, 177, 182
512, 183, 194, 183
1024, 175, 205, 200
2048, 178, 246, 252
4096, 181, 550, 596
8192, 180, 2412, 2425
16384, 145, 10651, 10835
32768, 188, 48748, 39925
65536, 197, 262159, 266275
The issue can be easily reproduced with "-XX:+UseZGC" - for some reasons ZGC doesn't collect unused ScheduledThreadPoolExecutor objects while running the test. In case of the default GC, the issue is reproducible, but time is not increasing continuosly, because GC has a chance to collect and free unused objects. But anyway time spent for "Thread.dump_to_file" is much higher than "Thread.print".
$ java Test
N, Thread.print, Thread.dump_to_file, Thread.dump_to_file -json
1, 327, 253, 201
2, 178, 181, 180
4, 179, 180, 193
8, 143, 144, 141
16, 145, 139, 141
32, 146, 175, 173
64, 167, 174, 181
128, 168, 185, 182
256, 177, 186, 189
512, 177, 186, 185
1024, 175, 189, 215
2048, 174, 225, 248
4096, 178, 377, 438
8192, 194, 1696, 1744
16384, 230, 1710, 185
32768, 172, 9808, 9592
65536, 189, 2655, 212
At the first sight it looks like most time is spent for iterating containers in ThreadContainer.collect().
Is it really needed to iterate through thread container to dump container's info to file (even for empty containers)? Changing this behaviour may significantly improve "Thread.dump_to_file" performance.
Doesn't it look strange that increasing amount of thread containers twice increases time x5?
And additional related question, is it correct that ZGC doesn't collect 60_000 unused ScheduledThreadPoolExecutor objects? This can be "fixed" by using ZCollectionInterval option, however I'm wondering if it is OK for ZGC.