-
Enhancement
-
Resolution: Fixed
-
P4
-
16
-
b24
(Note: "platform handlers" means JVM_handle_xxx_signal())
1) PosixSignals::are_signal_handlers_installed() is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. First this confused me since by the time the handler is called it would be of course installed. So that boolean would always be true. After thinking this through and discussions with David [4], the only explanation I have is that since these handlers can be invoked from outside, this is some (ineffective) form of guard against calling this handler too early, when the VM is not set up correctly.
That guard can be left out, and that boolean removed. Our signal handlers are safe to call before VM initialization is completed.
2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls).
3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported.
Note: Points (1)-(3) directly relate to AllowUserSignalHandlers. There had been discussions of renaming or removing "JVM_handle_xxx_signal", but these entry points have to be preserved in their current form, exactly, unless we deprecate AllowUserSignalHandlers. See [4] for an in-depth discussion.
4) Every platform handler has this section:
```
> JavaThread* thread = NULL;
> VMThread* vmthread = NULL;
> if (PosixSignals::are_signal_handlers_installed()) {
> if (t != NULL ){
> if(t->is_Java_thread()) {
> thread = t->as_Java_thread();
> }
> else if(t->is_VM_thread()){
> vmthread = (VMThread *)t;
> }
> }
> }
```
vmthread is unused on all platforms and can be removed.
5) Every platform handler has a variation of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM):
```
> if (sig == SIGPIPE || sig == SIGXFSZ) {
> // allow chained handler to go first
> if (PosixSignals::chained_handler(sig, info, ucVoid)) {
> return true;
> } else {
> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219
> return true;
> }
> }
```
- On AIX, Linux s390 and ppc, we miss SIGXFSZ handling (Update: has been fixed separately with
- both paths return true - section can be shortened
- some variants print out a warning, some don't
Side note: having handlers for those signals may be even completely unnecessary. We could just set the signal handler to SIG_IGN. We would have to take care to not disturb third party signal handlers for those signals, should they exist.
6) At the end of every platform header, before calling into fatal error handling, we unblock the signal:
```
> // unmask current signal
> sigset_t newset;
> sigemptyset(&newset);
> sigaddset(&newset, sig);
> sigprocmask(SIG_UNBLOCK, &newset, NULL);
>
```
- sigprocmask() is UB in a multithreaded program.
- but then, this section is unnecessary anyway since VMError::report_and_die sets up signal handling and the signal mask. Plus, since
7) the JFR crash protection is not consistently checked in all platform handlers.
8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(); thereby discarding the real crash ucontext and obfuscating the register content in the hs-err file (we see registers, but those are from the assertion-poison-page mechanism at the time fatal() was called, not the real registers).
9) on Linux ppc64 and AIX, we have this section:
> if (sig == SIGILL && (pc < (address) 0x200)) {
> goto report_and_die;
> }
this is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (thankfully, 0 is not a noop on PPC, so we don't nop-slide around).
This coding is irrelevant on Linux. On AIX, it can be left out too since this unrecognized SIGILL would cause a fatal error anyway.
10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in si_info.si_addr.
On most platforms that address is identical with context.pc; however, on some architectures the pc in the context actually points beyond that address to the next instruction. For those cases, si_info.si_addr is the better choice.
----
[1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html
[2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html
[3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html
[4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html
- duplicates
-
JDK-8143395 evaluate if sigthreadmask can be removed from hotspot (aix) codebase
- Closed
- relates to
-
JDK-8259392 Zero error reporting is broken after JDK-8255711
- Resolved
-
JDK-6499219 SIGXFSZ kills VM
- Closed
-
JDK-4361067 hotspot VM crashes when bringing up thru JNI when signal handler is installed
- Closed
-
JDK-8252324 Signal related code should be shared among POSIX platforms
- Resolved
-
JDK-8252533 Signal handlers should run with synchronous error signals unblocked
- Resolved
-
JDK-8253742 POSIX signal code cleanup
- Resolved
-
JDK-8259539 JDK-8255711 broke trap messages
- Resolved
-
JDK-8255734 VM should ignore SIGXFSZ on ppc64, s390 too
- Resolved
-
JDK-8279077 JFR crashes on Linux ppc due to missing crash protector in signal handler
- Resolved