Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6375302

(thread) Thread.currentThread().getStackTrace(); is 10x slower vs getting stacktrace from throwable

XMLWordPrintable

    • b78
    • x86
    • linux_redhat_3.0

      It seems like Thread.currentThread().getStackTrace(); is 10x slower,
      compared to getting the stacktrace from a throwable. Actually the
      overhead is probably due to context switching since all the time in
      the first case seems to be SystemTime and not user time.

      In my env where I run some stress tests I can max out the 4way Linux
      box with 20 client threads when I enable some diagnostic probes within
      our code. In this case almost all the CPU is SystemTime and appears to
      be due to Thread.currentThread().getStackTrace();

      Here is the test case. You will notice that one version is about
      10x slower than the other version.


      public class Test {

          public Test() {

          }

          public static void main(String[] args) {

      long start = System.currentTimeMillis();
      for(long i=0; i<100000; i++) {

                  StackTraceElement [] stackTrace;

                  if (args.length > 0)

      stackTrace = new Exception().getStackTrace();

                  else

      stackTrace = Thread.currentThread().getStackTrace();

      }

      System.out.println("Total time = " + (System.currentTimeMillis()
      - start) + "ms.");

          }
      }

      I looked into the VM code and here is what I can make of it -

      The stack is retrieved by calling vframeStream st((JavaThread*) THREAD);

      Both version do this. That is good. The exception version calls this
      directly.
      The getThreadDumps() does a few "extra" steps that can be more expensive -



      1. Get the request into a shared queue

      2. Schedule a VM thread to run your command.

      3. The scheduled VM thread seems to do some stuff to make sure it's safe
         to get stack trace.

      4. Allocates a C++ objects to collect the stack trace. This is much more
         expensive than a Java object. You have to new/delete - acquire mutex
         to do that.

      5. Finally calls vframeStream()

      6. The VM thread returns the result. And the caller gets the result
         out of the shared queue.
      There is a comment in file threadService.cpp that says

      // TODO: Optimization if only the current thread or maxDepth = 1

            psoper Pete Soper (Inactive)
            sdellafi Sandra Dellafiora (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: