JFR Leak Profiler: path-to-gc-root very slow for large object arrays

XMLWordPrintable

    • jfr
    • In Review

      I see a steep performance decrease with the JFR Leak Profiler under the following conditions:

      1) We have large object arrays on the heap

      2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays.

      Note that both BFS and DFS, for themselves, are very fast: when one enforces DFS (e.g., via the SkipBFS parameter, see JFR WhiteBox), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.

      The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by decreasing the size of the BFS edge queue in the code, since the VM prevents it from going below 32MB (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/pathToGcRootsOperation.cpp#L58)

      In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:

      - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds)

      - Object Array size of 230,000 elements: BFS/DFS mixed, (~20 seconds)

      - Object Array size of 240,000 elements: BFS/DFS mixed, (~90 seconds)

      - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds)

      - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)


      Examining the problem more closely, I see that we iterate over the large object array over and over again:

      - the BFS search starts iterating over the object array. It will find that its edge queue is too small, and will drop down to DFS for every single object element. That in itself is not the problem.

      - It will then continue to process edges, and re-process the array over and over again, passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. If I understand the code correctly, I think this is because the edges for the object array elements contain back-references to the edge of their holder object, which is the array).

      The effect is that, for an array of size N, we process that array N times instead of just once. So the run time will be O^2 over the array size.

      I believe the reason for this are missing mark checks at the border between BFS and DFS.

            Assignee:
            Thomas Stuefe
            Reporter:
            Thomas Stuefe
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: