Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8255711

Fix and unify hotspot signal handlers

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P4 P4
    • 16
    • 16
    • hotspot
    • b24

      Hotspot signal handlers have a number of issues as well as a lot of coding which could be simplified and/or unified for all POSIX platforms.

      (Note: "platform handlers" means JVM_handle_xxx_signal())

      1) PosixSignals::are_signal_handlers_installed() is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. First this confused me since by the time the handler is called it would be of course installed. So that boolean would always be true. After thinking this through and discussions with David [4], the only explanation I have is that since these handlers can be invoked from outside, this is some (ineffective) form of guard against calling this handler too early, when the VM is not set up correctly.

      That guard can be left out, and that boolean removed. Our signal handlers are safe to call before VM initialization is completed.

      2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls).

      3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported.

      Note: Points (1)-(3) directly relate to AllowUserSignalHandlers. There had been discussions of renaming or removing "JVM_handle_xxx_signal", but these entry points have to be preserved in their current form, exactly, unless we deprecate AllowUserSignalHandlers. See [4] for an in-depth discussion.

      4) Every platform handler has this section:

      ```
      > JavaThread* thread = NULL;
      > VMThread* vmthread = NULL;
      > if (PosixSignals::are_signal_handlers_installed()) {
      > if (t != NULL ){
      > if(t->is_Java_thread()) {
      > thread = t->as_Java_thread();
      > }
      > else if(t->is_VM_thread()){
      > vmthread = (VMThread *)t;
      > }
      > }
      > }
      ```

      vmthread is unused on all platforms and can be removed.

      5) Every platform handler has a variation of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM):

      ```
      > if (sig == SIGPIPE || sig == SIGXFSZ) {
      > // allow chained handler to go first
      > if (PosixSignals::chained_handler(sig, info, ucVoid)) {
      > return true;
      > } else {
      > // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219
      > return true;
      > }
      > }
      ```

      - On AIX, Linux s390 and ppc, we miss SIGXFSZ handling (Update: has been fixed separately with JDK-8255734).
      - both paths return true - section can be shortened
      - some variants print out a warning, some don't

      Side note: having handlers for those signals may be even completely unnecessary. We could just set the signal handler to SIG_IGN. We would have to take care to not disturb third party signal handlers for those signals, should they exist.

      6) At the end of every platform header, before calling into fatal error handling, we unblock the signal:

      ```
      > // unmask current signal
      > sigset_t newset;
      > sigemptyset(&newset);
      > sigaddset(&newset, sig);
      > sigprocmask(SIG_UNBLOCK, &newset, NULL);
      >
      ```

      - sigprocmask() is UB in a multithreaded program.
      - but then, this section is unnecessary anyway since VMError::report_and_die sets up signal handling and the signal mask. Plus, since JDK-8252533 we unmask error signals at the start of the signal handler, and we do not call report_and_die for any other reason than an error signal.

      7) the JFR crash protection is not consistently checked in all platform handlers.

      8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(); thereby discarding the real crash ucontext and obfuscating the register content in the hs-err file (we see registers, but those are from the assertion-poison-page mechanism at the time fatal() was called, not the real registers).

      9) on Linux ppc64 and AIX, we have this section:

      > if (sig == SIGILL && (pc < (address) 0x200)) {
      > goto report_and_die;
      > }

      this is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (thankfully, 0 is not a noop on PPC, so we don't nop-slide around).

      This coding is irrelevant on Linux. On AIX, it can be left out too since this unrecognized SIGILL would cause a fatal error anyway.

      10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in si_info.si_addr.

      On most platforms that address is identical with context.pc; however, on some architectures the pc in the context actually points beyond that address to the next instruction. For those cases, si_info.si_addr is the better choice.

      ----

      [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html
      [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html
      [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html
      [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html

            stuefe Thomas Stuefe
            stuefe Thomas Stuefe
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: