Currently there is "HeapDumpOnOutOfMemoryError" option that user could use to get heapdump when OOM happened.
This is useful for analyzing memory leak issues or to inspect the heap content to help identify the root cause of OOM.
However, from our experience, when the heap is large like 100GB, the heap dump could take lots time to finish, which means that the online program can not be restarted or killed before the heapdump complete.
Although the parallel heap dump could help in this case for heap iteration, we observed that lots time spend on disk writting, which is hard to optimize due to disk bandwidth.
On the other hand, the heap histogram could also help identify heap related issues. And because histo data are calculated and aggregated in memory rather than written to disk, the speed of heap inspection is much faster than heap dump.
Therefore I propose adding an option "-XX:+HeapHistoOnOutOfMemoryError" to help identify OOM error and meanwhile avoid the long duration of heap dump.
This is useful for analyzing memory leak issues or to inspect the heap content to help identify the root cause of OOM.
However, from our experience, when the heap is large like 100GB, the heap dump could take lots time to finish, which means that the online program can not be restarted or killed before the heapdump complete.
Although the parallel heap dump could help in this case for heap iteration, we observed that lots time spend on disk writting, which is hard to optimize due to disk bandwidth.
On the other hand, the heap histogram could also help identify heap related issues. And because histo data are calculated and aggregated in memory rather than written to disk, the speed of heap inspection is much faster than heap dump.
Therefore I propose adding an option "-XX:+HeapHistoOnOutOfMemoryError" to help identify OOM error and meanwhile avoid the long duration of heap dump.