Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8261520

JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java

XMLWordPrintable

    • b12

      Since JDK-8261302, the test runtime/NMT/CheckForProperDetailStackTrace.java fails with

      java.lang.RuntimeException: 'NativeCallStack::NativeCallStack' found in stdout

      --

      `NativeCallStack` contains a hash code as a convenience to place it into the malloc site hashmap. Before JDK-8261302, that hash code was calculated lazily in a non-inline hashcode getter. With JDK-8261302, the hash code calculation was moved into the `NativeCallStack` constructor and the getter was made inline.

      The `NativeCallStack` constructor fills its stack frames by calling `os::get_native_stack()`. Before JDK-8261302, that call to `os::get_native_stack()` has been the last call in the constructor and has been optimized sometimes into a tail call. Whether or not its a tail call matters since it affects the number of stack frames the stack walker has to skip. Therefore, the constructor contains rather complex and fragile coding to predict whether the tail call optimization happens:

      ```
      #if (defined(_NMT_NOINLINE_) || defined(_WINDOWS) || !defined(_LP64) || defined(PPC64))
        // Not a tail call.
        toSkip++;
      #if (defined(_NMT_NOINLINE_) && defined(BSD) && defined(_LP64))
        // Mac OS X slowdebug builds have this odd behavior where NativeCallStack::NativeCallStack
        // appears as two frames, so we need to skip an extra frame.
        toSkip++;
      #endif // Special-case for BSD.
      #endif // Not a tail call.
      ```

      This prediction was now wrong since the hash code calculation, which needs to happen after the stack was captured, happened afterwards. This causes the test error, since on some platforms (eg Linux x64) we now think we have a tail call when we don't, which means we do not skip enough frames, and the NMT output contains call frames like "NativeCallStack::NativeCallStack()", which trips the test.

      There are several ways to fix it:
      1) leave it this way: live with the fact that we lose tail call optimization, and just always skip the frame. I am not sure how much that optimization really matters. This would have the big benefit of removing the need for the fragile guessing code which causes quite a bit of maintenance effort.
      2) modify os::get_native_stack() to, in addition to returning the stack, also calculate and write the hash value. That would make os::get_native_stack() less general purpose. Currently its only used by NMT but in theory is general-purpose enough for the os:: namespace.
      3) There is no good reason why class NativeCallStack even needs to hold the hash value. Arguably this is not its business. The fact that some caller (NMT) holds instances of it in a hash map should not concern it. However, moving the hash code out of NativeCallStack up into NMTs AllocationSite class would have some wider repercussions, which need checking.

            stuefe Thomas Stuefe
            kbarrett Kim Barrett
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: