We have MallocLimit, a way to trigger errors when reaching a given malloc load threshold. This PR proposes
a complementary switch, RSSLimit, that does the same based on the Resident Set Size of the process.
Motivation:
The main usage for this option is to analyze situations that would lead to an OOM kill of the process. OOM kills can happen at various layers: the process may be either killed by the kernel OOM killer, or the whole container may get scrapped if it uses too much memory. In either case, one has little or no information to go on; often, one does not even know it was the OOM killer, or if the JVM was really responsible. In these situations, getting a voluntary abort before the process is killed can give us valuable information we would not get otherwise.
Another use of this feature can be testing: specifying an envelope of "reasonable" RSS for testing to check the expected footprint of the JVM. Also useful for a global test-wide setting, to catch obvious footprint degradations early.
Letting the JVM handle this Limit has many advantages:
Since the limit is artificial, error reporting is not affected. Other mechanisms (e.g. ulimit) are likely to prevent effective error reporting. I usually get torn hs-err files when a limit restriction hits since error reporting needs dynamic memory (regrettably) and space on the stack to do its work.
Re-using the normal error reporting mechanism is powerful since:
- hs-err files contain lots of information already: machine memory status, NMT summary, heap information etc.
- Using OnError, that mechanism is expandable: we can run many further diagnostics like Metaspace or Compiler memory reports, detailed NMT reports, System memory maps, and even heap dumps.
- Using ErrorLogToStd(out|err) will redirect the hs-err file and let us see what's happening in cloud situations where file systems are often ephemeral.
a complementary switch, RSSLimit, that does the same based on the Resident Set Size of the process.
Motivation:
The main usage for this option is to analyze situations that would lead to an OOM kill of the process. OOM kills can happen at various layers: the process may be either killed by the kernel OOM killer, or the whole container may get scrapped if it uses too much memory. In either case, one has little or no information to go on; often, one does not even know it was the OOM killer, or if the JVM was really responsible. In these situations, getting a voluntary abort before the process is killed can give us valuable information we would not get otherwise.
Another use of this feature can be testing: specifying an envelope of "reasonable" RSS for testing to check the expected footprint of the JVM. Also useful for a global test-wide setting, to catch obvious footprint degradations early.
Letting the JVM handle this Limit has many advantages:
Since the limit is artificial, error reporting is not affected. Other mechanisms (e.g. ulimit) are likely to prevent effective error reporting. I usually get torn hs-err files when a limit restriction hits since error reporting needs dynamic memory (regrettably) and space on the stack to do its work.
Re-using the normal error reporting mechanism is powerful since:
- hs-err files contain lots of information already: machine memory status, NMT summary, heap information etc.
- Using OnError, that mechanism is expandable: we can run many further diagnostics like Metaspace or Compiler memory reports, detailed NMT reports, System memory maps, and even heap dumps.
- Using ErrorLogToStd(out|err) will redirect the hs-err file and let us see what's happening in cloud situations where file systems are often ephemeral.
- links to
-
Review(master) openjdk/jdk/16938