Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: 14
Affects Version/s: 10, 11, 12, 13, 14
Component/s: hotspot
Labels:

Subcomponent:
runtime
Resolved In Build:
b06

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8261602	13.0.7	Ekaterina Vergizova	P3	Resolved	Fixed	b02
JDK-8227571	13	Thomas Stuefe	P3	Closed	Won't Fix
JDK-8257849	11.0.11-oracle	Dukebot	P3	Resolved	Fixed	b01
JDK-8257399	11.0.10	Thomas Stuefe	P3	Resolved	Fixed	b05

Summary: on OOM, we fail to disarm assertion poison page; this may lead to endless loops during error handling if assertions happen in native OOM scenarios.

--

When an assert happens, we touch a poison page to receive the current ucontext for error analysis. That works like this:

assert ->
touch assertion poison page (immediately, in the same frame, with as little as possible code running after evaluating the assert condition) ->
bang! enter signal handler ->
in signal handling, copy ucontext ->
and disable poison page ->
return from signal handler, brings us to the same load which triggered the original crash ->
repeat touching the poison page. It is disarmed now, so a noop ->
continue handling the assertion.

In case of a native OOM, this may fail; the mprotect call used to disarm the poison page may return with ENOMEM (depends on the OS, but can happen e.g. on Linux when switching from PROT_NONE to PROT_RW). Leaving the poison page armed.

The chance of this happening for normal assertion scenario (an OOM hitting out of the blue just when we hit an assert and attempt to disarm the poison page) is astronomically small.

However, this may happen as a result of an OOM elsewhere, which could trigger a follow up assertion. Then this happens:

... OOM! ...
...
assert ->
touch assert poison page ->
bang! enter signal handler ->
in signal handling, copy ucontext ->
and disable poison page - but that fails! ->
current code does not care, returns to asserting code, to the same opcode ->
again touch assert poison. ->
enter signal handler ->
repeat...
...

Endless loop; since we do not use stack space this can go on forever, and since we effectively disable signal handling the error handler timeout does not seem to work either. Process hangs.

Most native OOM situations in the hotspot are handled cleanly: they either are handled explicitly by the caller or they enter error handling via VMError::report_vm_out_of_memory(). This means that an assertion following a native OOM most likely happens during error handling. This slightly changes the picture above:

... OOM! ...
...
assert ->
touch assert poison page ->
bang! enter secondary signal handler (crash_handler() in vmError_posix.cpp) ->
in signal handling, copy ucontext ->
and disable poison page - but that fails! ->
current code does not care, returns to asserting code, to the same opcode ->
again touch assert poison. ->
enter secondary signal handler (crash_handler() in vmError_posix.cpp) ->
repeat...
...

One simple fix could be to just switch off the assertion poison page after entering the VMError::report_and_die(). We do not need it from that point on, since we do not care for secondary asserts or asserts happening in parallel threads (much).

Also, when we fail to disarm the poison page, we should not just return from the signal handler. Since we cannot do much else, we should proceed as if this were a real crash. This will "hide" an assert behind a SIGSEGV and can be confusing if one does not closely examines the call stack, but it is still better than the process hanging.

backported by

JDK-8257399 Within native OOM error handling, assertions may hang the process

Resolved

JDK-8257849 Within native OOM error handling, assertions may hang the process

Resolved

JDK-8261602 Within native OOM error handling, assertions may hang the process

Resolved

JDK-8227571 Within native OOM error handling, assertions may hang the process

Closed

relates to

JDK-8216982 Assertion poison page established too early

Resolved

JDK-8225703 crash_handler code makes safepoint polling threads look like they crashed

Closed

JDK-8191101 Show register content in hs-err file on assert

Resolved

links to

Commit openjdk/jdk13u-dev/86ccee1a

Review openjdk/jdk13u-dev/121

(2 relates to, 2 links to)

Assignee:: Thomas Stuefe
Reporter:: Thomas Stuefe
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: 2019-07-04 07:04
Updated:: 2021-02-11 08:07
Resolved:: 2019-07-11 12:45

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates