Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8321178

jcmd Thread.dump_to_file takes increasing time when STPE instances are left unreferenced without closing

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • None
    • core-svc

      Thread.dump_to_file takes disproportionally much time with increasing of amount of ThreadContainers, even if ThreadContainers are empty.

      Here is the reproducer test (see attached) creating empty ThreadContainers and sequentially running JCMD with "Thread.print", "Thread.dump_to_file", "Thread.dump_to_file -format=json". Amount of threads is not changed.

      The results are below. The first column, N, is amount of the created thread containers. Other columns are time spent for JCMD run, in milliseconds.
      Time spending for "Thread.print" (2nd column) is not changed.
      Time spending for "Thread.dump_to_file" (3rd and 4th columns) starts increasing significantly when amount of thread containers is greater than 2000. Increasing amount of thread containers twice increases the time spent x4-x5 times.

      $ java -XX:+UseZGC Test
      N, Thread.print, Thread.dump_to_file, Thread.dump_to_file -json
           1, 352, 292, 190
           2, 179, 172, 171
           4, 181, 174, 175
           8, 179, 176, 175
          16, 174, 177, 178
          32, 177, 177, 176
          64, 178, 173, 191
         128, 172, 173, 185
         256, 169, 177, 182
         512, 183, 194, 183
        1024, 175, 205, 200
        2048, 178, 246, 252
        4096, 181, 550, 596
        8192, 180, 2412, 2425
       16384, 145, 10651, 10835
       32768, 188, 48748, 39925
       65536, 197, 262159, 266275

      The issue can be easily reproduced with "-XX:+UseZGC" - for some reasons ZGC doesn't collect unused ScheduledThreadPoolExecutor objects while running the test. In case of the default GC, the issue is reproducible, but time is not increasing continuosly, because GC has a chance to collect and free unused objects. But anyway time spent for "Thread.dump_to_file" is much higher than "Thread.print".

      $ java Test
      N, Thread.print, Thread.dump_to_file, Thread.dump_to_file -json
           1, 327, 253, 201
           2, 178, 181, 180
           4, 179, 180, 193
           8, 143, 144, 141
          16, 145, 139, 141
          32, 146, 175, 173
          64, 167, 174, 181
         128, 168, 185, 182
         256, 177, 186, 189
         512, 177, 186, 185
        1024, 175, 189, 215
        2048, 174, 225, 248
        4096, 178, 377, 438
        8192, 194, 1696, 1744
       16384, 230, 1710, 185
       32768, 172, 9808, 9592
       65536, 189, 2655, 212

      At the first sight it looks like most time is spent for iterating containers in ThreadContainer.collect().

      Is it really needed to iterate through thread container to dump container's info to file (even for empty containers)? Changing this behaviour may significantly improve "Thread.dump_to_file" performance.

      Doesn't it look strange that increasing amount of thread containers twice increases time x5?

      And additional related question, is it correct that ZGC doesn't collect 60_000 unused ScheduledThreadPoolExecutor objects? This can be "fixed" by using ZCollectionInterval option, however I'm wondering if it is OK for ZGC.

            alanb Alan Bateman
            rmarchenko Roman Marchenko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: