Today the shortest path back to GC root is found using a breadth-first-search algorithm.
This works well for most Java heaps, but not all of them. If native memory for the BFS queue uses too much memory, a depth-first-search algorithm is used for the remaning objects. This is to prevent memory to be swapped to disk in a scenario where the heap only consists of a long chain of objects, for example a java.util.LinkedList. DFS can't guarantee shortest path to the GC root, but it works well in practise.
There are however several problems with the DFS implementation today. It is implemented recursively and a GC closure is stack allocated with every new object that is found on the heap. This makes the algorithm slow and limited to the size of the native thread stack. Walking the heap in a depth-first manner also leads to increased cache misses, since this is not how objects are typically laid on the heap by the GC.
Furthermore, if the root set is too large to fit the BFS queue, a second DFS algorithm is used, so there a three ways the heap can potentially be traversed depending on how the Java heap looks like.
It would be good if we could switch to a BFS only implementation to reduce maintenance cost, especially if we want to search in parallel or concurrently in the future. Today the BFS queue uses (at most) native memory corresponding to 5% of the Java heap. Every added edge to the queue requires two 64-bit pointers (16 bytes).
Here are some ideas on how we could make DFS unnecessary:
- Don't add leaf objects to the queue. We could tag classes that don't have followable fields, and then not add object of that type to the queue. For example, java.lang.Integer.
- The char[] of a java.lang.String object could be checked immediately and thus avoid pushing String objects on the queue. A generalization of the idea is to process all objects with one followable field immediately, but it will in some sense be the revival of DFS.
- Instead of storing 64-bit pointers to where the parent edge and the Java object is located, we could store LEB 128 compressed pointers and ignore lower bits due to alignment. if the Java heap size is below 256 GB, memory usage would be cut at least in half.
- Prune the BFS queue. If a leaf object is found, go back to the parent and check if all children are leafs. If that is the case, the parent and all its children could be removed from the queue before going deeper. An alternative implementation is to use a two staged BFS queue. A smaller queue holds let's say 16 edges. If an object can be searched exhaustedly within that limit, no objects from that subtree need to be added to the larger BFS queue.
- When possible, store the parent edge pointer in the mark word. If the parent edge points to an object on the Java heap, reconstruct the edge from the object and parent edge in the mark word.
In cases where non of the above techniques reduces the memory overhead below 5%, we could give up and make it clear in the documentation there are certain (pathological) Java heaps that Memory-Leak Profiler will not find all leaking objects (this is already the case today if the depth is more then 5000 objects), or we could use more than the 5% and hope for the best when it comes to swapping.
This works well for most Java heaps, but not all of them. If native memory for the BFS queue uses too much memory, a depth-first-search algorithm is used for the remaning objects. This is to prevent memory to be swapped to disk in a scenario where the heap only consists of a long chain of objects, for example a java.util.LinkedList. DFS can't guarantee shortest path to the GC root, but it works well in practise.
There are however several problems with the DFS implementation today. It is implemented recursively and a GC closure is stack allocated with every new object that is found on the heap. This makes the algorithm slow and limited to the size of the native thread stack. Walking the heap in a depth-first manner also leads to increased cache misses, since this is not how objects are typically laid on the heap by the GC.
Furthermore, if the root set is too large to fit the BFS queue, a second DFS algorithm is used, so there a three ways the heap can potentially be traversed depending on how the Java heap looks like.
It would be good if we could switch to a BFS only implementation to reduce maintenance cost, especially if we want to search in parallel or concurrently in the future. Today the BFS queue uses (at most) native memory corresponding to 5% of the Java heap. Every added edge to the queue requires two 64-bit pointers (16 bytes).
Here are some ideas on how we could make DFS unnecessary:
- Don't add leaf objects to the queue. We could tag classes that don't have followable fields, and then not add object of that type to the queue. For example, java.lang.Integer.
- The char[] of a java.lang.String object could be checked immediately and thus avoid pushing String objects on the queue. A generalization of the idea is to process all objects with one followable field immediately, but it will in some sense be the revival of DFS.
- Instead of storing 64-bit pointers to where the parent edge and the Java object is located, we could store LEB 128 compressed pointers and ignore lower bits due to alignment. if the Java heap size is below 256 GB, memory usage would be cut at least in half.
- Prune the BFS queue. If a leaf object is found, go back to the parent and check if all children are leafs. If that is the case, the parent and all its children could be removed from the queue before going deeper. An alternative implementation is to use a two staged BFS queue. A smaller queue holds let's say 16 edges. If an object can be searched exhaustedly within that limit, no objects from that subtree need to be added to the larger BFS queue.
- When possible, store the parent edge pointer in the mark word. If the parent edge points to an object on the Java heap, reconstruct the edge from the object and parent edge in the mark word.
In cases where non of the above techniques reduces the memory overhead below 5%, we could give up and make it clear in the documentation there are certain (pathological) Java heaps that Memory-Leak Profiler will not find all leaking objects (this is already the case today if the depth is more then 5000 objects), or we could use more than the 5% and hope for the best when it comes to swapping.