-
Enhancement
-
Resolution: Won't Fix
-
P5
-
None
-
5.0
This is mostly a reminder rfe.
The hprof in v1.5.0 was converted to use JVMTI, experience over time has
caused me to re-think some of the design of this code. These are just a
list of items that I had. Documented here so they are not lost.
* Memory allocation. It uses malloc(), and worse realloc() but I'm
concerned that this causes threads to block unnecessarily.
One idea was to change the tables so that all memory allocated is never
moved, e.g. don't use realloc. Allocate pages of table entries and based
on the table index, find the page. A lock might be needed on the page lookup
but not access to the table element. That would be a quick monitor enter/exit
and the actual pointer could even be used as the Tag on objects or the
ThreadLocalStorage pointer, avoiding a table lookup for the TlsIndex but
more importantly, avoiding the monitor contention.
Ideal is no monitors held during these HOT BCI Object.<init> events.
Might be nice to have a per-thread memory allocator for some things, or use
something like alloca() for allocations that have short durations.
Dumping class instances and getAllFields() do a lot of allocations that
could be just alloca() type allocations.
This applies to signature_to_name() in hprof_io.c in particular.
If we had a native library where we could create a separate heap area
for each thread: heap = new_heap("Thread 1", initial_size);
and allocate with ptr = heap_malloc(heap, nbytes);
That sure could help matters... See separate RFE on this.
* Need faster way to get from jclass to ClassIndex, need to avoid
GetClassSignature and required Deallocate if we can. Perhaps a lookup
of a ClassIndex via the jclass Object hashCode? Separate lookup tables
in hprof_table.c would be a nice addition to this table functionality.
* GetStackTrace appears to be expensive. Could it not be called all the time?
Could we use the cpu=times mechanism to get the Stack for hprof=* instead
of asking JVMTI? How big an issue is GetStackTrace performance?
* Table walk needs a short circuit mechanism. Callbacks should have function
returns that tell table walker to abort or continue walk.
The places where table walks are used for searches (TLS) would benefit,
along with the loader table.
* Taking a huge step backward, perhaps the Tracker class shouldn't call
native code at all, but just buffer events, then agent could call another
Java method in the class when it wanted to unload the event buffer.
The java code could block when the buffer filled or something like that,
but perhaps the buffer is provided by the agent library so that this
memory doesn't skew the user app stats?
This could be a huge experiment, someone would need to allow lost of time
to play with this. The goal of course would be to get the BCI code more
into a 100% Java world so that the JVM can optimize it better.
* A better logging mechanism would be nice. Perhaps a separate logging
library that both JDWP backend and hprof and other native code in the JVM
could use, maybe even the JVM too. Creating a single native logging file?
Maybe even using the java logging settings?
###@###.### 2004-05-28
The hprof in v1.5.0 was converted to use JVMTI, experience over time has
caused me to re-think some of the design of this code. These are just a
list of items that I had. Documented here so they are not lost.
* Memory allocation. It uses malloc(), and worse realloc() but I'm
concerned that this causes threads to block unnecessarily.
One idea was to change the tables so that all memory allocated is never
moved, e.g. don't use realloc. Allocate pages of table entries and based
on the table index, find the page. A lock might be needed on the page lookup
but not access to the table element. That would be a quick monitor enter/exit
and the actual pointer could even be used as the Tag on objects or the
ThreadLocalStorage pointer, avoiding a table lookup for the TlsIndex but
more importantly, avoiding the monitor contention.
Ideal is no monitors held during these HOT BCI Object.<init> events.
Might be nice to have a per-thread memory allocator for some things, or use
something like alloca() for allocations that have short durations.
Dumping class instances and getAllFields() do a lot of allocations that
could be just alloca() type allocations.
This applies to signature_to_name() in hprof_io.c in particular.
If we had a native library where we could create a separate heap area
for each thread: heap = new_heap("Thread 1", initial_size);
and allocate with ptr = heap_malloc(heap, nbytes);
That sure could help matters... See separate RFE on this.
* Need faster way to get from jclass to ClassIndex, need to avoid
GetClassSignature and required Deallocate if we can. Perhaps a lookup
of a ClassIndex via the jclass Object hashCode? Separate lookup tables
in hprof_table.c would be a nice addition to this table functionality.
* GetStackTrace appears to be expensive. Could it not be called all the time?
Could we use the cpu=times mechanism to get the Stack for hprof=* instead
of asking JVMTI? How big an issue is GetStackTrace performance?
* Table walk needs a short circuit mechanism. Callbacks should have function
returns that tell table walker to abort or continue walk.
The places where table walks are used for searches (TLS) would benefit,
along with the loader table.
* Taking a huge step backward, perhaps the Tracker class shouldn't call
native code at all, but just buffer events, then agent could call another
Java method in the class when it wanted to unload the event buffer.
The java code could block when the buffer filled or something like that,
but perhaps the buffer is provided by the agent library so that this
memory doesn't skew the user app stats?
This could be a huge experiment, someone would need to allow lost of time
to play with this. The goal of course would be to get the BCI code more
into a 100% Java world so that the JVM can optimize it better.
* A better logging mechanism would be nice. Perhaps a separate logging
library that both JDWP backend and hprof and other native code in the JVM
could use, maybe even the JVM too. Creating a single native logging file?
Maybe even using the java logging settings?
###@###.### 2004-05-28