SAP developed a supportability feature called "Statistics History" which is a low cost history of statistical values-of-interest.
These values contain parameters of the jvm (e.g. heap size, metaspace size, number of loaded classes) and the underlying platform (e.g. rss, swapping state, run queue length etc). At intervals of (by default) 60 seconds these values are measured and stored in a fifo buffer.
The fifo buffer has three parts, a short-, medium-, and long term fifo buffer. A fraction of the samples falling out of the short term fifo is transferred to the mid term fifo; again, a fraction of the samples falling out of the mid term fifo is transferred to the long term fifo. So, the short term fifo covers a short recent timespan (usually an hour) in comparativly short sample intervals (usually 60 seconds), whereas the long term fifo covers a very long time span (~10 days) with interval times of hours.
This feature has been very popular with our support folks and so we would like to contribute that. It enables us to easily analyze slowly developing situations like memory leaks, memory or cpu spikes, resource starvation etc.
The aim of this feature is not to replace "real" profilers like the JMC; rather to be a cheap, always-on first stop to get a rough estimate on what is going on.
These values contain parameters of the jvm (e.g. heap size, metaspace size, number of loaded classes) and the underlying platform (e.g. rss, swapping state, run queue length etc). At intervals of (by default) 60 seconds these values are measured and stored in a fifo buffer.
The fifo buffer has three parts, a short-, medium-, and long term fifo buffer. A fraction of the samples falling out of the short term fifo is transferred to the mid term fifo; again, a fraction of the samples falling out of the mid term fifo is transferred to the long term fifo. So, the short term fifo covers a short recent timespan (usually an hour) in comparativly short sample intervals (usually 60 seconds), whereas the long term fifo covers a very long time span (~10 days) with interval times of hours.
This feature has been very popular with our support folks and so we would like to contribute that. It enables us to easily analyze slowly developing situations like memory leaks, memory or cpu spikes, resource starvation etc.
The aim of this feature is not to replace "real" profilers like the JMC; rather to be a cheap, always-on first stop to get a rough estimate on what is going on.