-
Enhancement
-
Resolution: Fixed
-
P3
-
5.0u4
-
b78
-
x86
-
linux_redhat_3.0
It seems like Thread.currentThread().getStackTrace(); is 10x slower,
compared to getting the stacktrace from a throwable. Actually the
overhead is probably due to context switching since all the time in
the first case seems to be SystemTime and not user time.
In my env where I run some stress tests I can max out the 4way Linux
box with 20 client threads when I enable some diagnostic probes within
our code. In this case almost all the CPU is SystemTime and appears to
be due to Thread.currentThread().getStackTrace();
Here is the test case. You will notice that one version is about
10x slower than the other version.
public class Test {
public Test() {
}
public static void main(String[] args) {
long start = System.currentTimeMillis();
for(long i=0; i<100000; i++) {
StackTraceElement [] stackTrace;
if (args.length > 0)
stackTrace = new Exception().getStackTrace();
else
stackTrace = Thread.currentThread().getStackTrace();
}
System.out.println("Total time = " + (System.currentTimeMillis()
- start) + "ms.");
}
}
I looked into the VM code and here is what I can make of it -
The stack is retrieved by calling vframeStream st((JavaThread*) THREAD);
Both version do this. That is good. The exception version calls this
directly.
The getThreadDumps() does a few "extra" steps that can be more expensive -
1. Get the request into a shared queue
2. Schedule a VM thread to run your command.
3. The scheduled VM thread seems to do some stuff to make sure it's safe
to get stack trace.
4. Allocates a C++ objects to collect the stack trace. This is much more
expensive than a Java object. You have to new/delete - acquire mutex
to do that.
5. Finally calls vframeStream()
6. The VM thread returns the result. And the caller gets the result
out of the shared queue.
There is a comment in file threadService.cpp that says
// TODO: Optimization if only the current thread or maxDepth = 1
compared to getting the stacktrace from a throwable. Actually the
overhead is probably due to context switching since all the time in
the first case seems to be SystemTime and not user time.
In my env where I run some stress tests I can max out the 4way Linux
box with 20 client threads when I enable some diagnostic probes within
our code. In this case almost all the CPU is SystemTime and appears to
be due to Thread.currentThread().getStackTrace();
Here is the test case. You will notice that one version is about
10x slower than the other version.
public class Test {
public Test() {
}
public static void main(String[] args) {
long start = System.currentTimeMillis();
for(long i=0; i<100000; i++) {
StackTraceElement [] stackTrace;
if (args.length > 0)
stackTrace = new Exception().getStackTrace();
else
stackTrace = Thread.currentThread().getStackTrace();
}
System.out.println("Total time = " + (System.currentTimeMillis()
- start) + "ms.");
}
}
I looked into the VM code and here is what I can make of it -
The stack is retrieved by calling vframeStream st((JavaThread*) THREAD);
Both version do this. That is good. The exception version calls this
directly.
The getThreadDumps() does a few "extra" steps that can be more expensive -
1. Get the request into a shared queue
2. Schedule a VM thread to run your command.
3. The scheduled VM thread seems to do some stuff to make sure it's safe
to get stack trace.
4. Allocates a C++ objects to collect the stack trace. This is much more
expensive than a Java object. You have to new/delete - acquire mutex
to do that.
5. Finally calls vframeStream()
6. The VM thread returns the result. And the caller gets the result
out of the shared queue.
There is a comment in file threadService.cpp that says
// TODO: Optimization if only the current thread or maxDepth = 1
- relates to
-
JDK-6519092 (thread) Thread.getStrackTrace() returns different results on jdk5.0 and jdk6
-
- Closed
-