Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6375302

(thread) Thread.currentThread().getStackTrace(); is 10x slower vs getting stacktrace from throwable



    • Enhancement
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • 5.0u4
    • 6
    • core-libs
    • b78
    • x86
    • linux_redhat_3.0


      It seems like Thread.currentThread().getStackTrace(); is 10x slower,
      compared to getting the stacktrace from a throwable. Actually the
      overhead is probably due to context switching since all the time in
      the first case seems to be SystemTime and not user time.

      In my env where I run some stress tests I can max out the 4way Linux
      box with 20 client threads when I enable some diagnostic probes within
      our code. In this case almost all the CPU is SystemTime and appears to
      be due to Thread.currentThread().getStackTrace();

      Here is the test case. You will notice that one version is about
      10x slower than the other version.

      public class Test {

          public Test() {


          public static void main(String[] args) {

      long start = System.currentTimeMillis();
      for(long i=0; i<100000; i++) {

                  StackTraceElement [] stackTrace;

                  if (args.length > 0)

      stackTrace = new Exception().getStackTrace();


      stackTrace = Thread.currentThread().getStackTrace();


      System.out.println("Total time = " + (System.currentTimeMillis()
      - start) + "ms.");


      I looked into the VM code and here is what I can make of it -

      The stack is retrieved by calling vframeStream st((JavaThread*) THREAD);

      Both version do this. That is good. The exception version calls this
      The getThreadDumps() does a few "extra" steps that can be more expensive -

      1. Get the request into a shared queue

      2. Schedule a VM thread to run your command.

      3. The scheduled VM thread seems to do some stuff to make sure it's safe
         to get stack trace.

      4. Allocates a C++ objects to collect the stack trace. This is much more
         expensive than a Java object. You have to new/delete - acquire mutex
         to do that.

      5. Finally calls vframeStream()

      6. The VM thread returns the result. And the caller gets the result
         out of the shared queue.
      There is a comment in file threadService.cpp that says

      // TODO: Optimization if only the current thread or maxDepth = 1


        Issue Links



              psoper Pete Soper (Inactive)
              sdellafi Sandra Dellafiora (Inactive)
              0 Vote for this issue
              2 Start watching this issue