From achut@nairobi Thu Dec 11 18:40:50 1997
Return-Path: <achut@nairobi>
Received: from kindra.eng.sun.com by taller.eng.sun.com (SMI-8.6/SMI-SVR4)
id SAA04266; Thu, 11 Dec 1997 18:40:50 -0800
Received: from nairobi.eng.sun.com by kindra.eng.sun.com (SMI-8.6/SMI-SVR4)
id SAA15161; Thu, 11 Dec 1997 18:40:46 -0800
Received: from nairobi.eng.sun.com by nairobi.eng.sun.com (SMI-8.6/SMI-SVR4)
id SAA09934; Thu, 11 Dec 1997 18:40:41 -0800
Date: Thu, 11 Dec 1997 18:40:41 -0800 (PST)
From: Achut Reddy <achut@nairobi>
Reply-To: Achut Reddy <achut@nairobi>
Subject: Proposal for Memory Profiling format changes
To: sheng.liang@nairobi, tom.rodriguez@nairobi, anand.palaniswamy@nairobi
Cc: gwhite@kindra
Message-ID: <libSDtMail.9712111840.8129.achut@nairobi/nairobi>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Content-MD5: 618IcVG494uuiDzivMQ11g==
X-Mailer: dtmail 1.1.0 CDE Version 1.1_50 SunOS 5.5.1 sun4m sparc
Content-Length: 6407
Status: RO
X-Status: $$$$
X-UID: 0000002352
Sheng,
We have been working on a version of the memory profiler that uses the
binary format hprof data. One thing we noticed right away is that
volume of data is much too high for us to be able to handle well. For
anything beyond simple toy programs, the amount of data was *huge* (for
one app we tried, over 1 GB of data was generated).
This amount of data means we cannot even keep up with reading the data
(a pure reader in Java which simply read the data and threw it away
would barely keep up; even then, probably only on a multi-cpu machine,
or two separate machines). Even if we could read it fast enough, there
is no way we could store 1 GB or more of data in memory. Even if the
VM and OS allowed it (which it probably won't), the performance of our
tool would be so slow as to be useless.
Also, it seems a waste to deal with so much data which our current tool
does not even make use of in its displays. We considered "reducing"
the data on the fly, but after some analysis we decided this would only
help a little bit. In order to make a usable tool, we need a drastic
reduction in the amount of data being sent by the VM. Below is our
specific proposal. It is on the order of the amount of data that we
got with the ASCII version. It also adds some additional info needed
by our tool.
Please review this and let us know what you think.
Proposal for extending -hprof option:
------------------------------------
1. We keep everything we have now, but add a new "short format"
sub-option to -hprof:
-hprof:format=short
-hprof:format=long
If format=long, everything behaves mostly as it does now (with some
minor additions, see below).
If format=short, then emit new record types (described below),
and only on GC events, and on HPROF_CMD_ALLOCS.
The default could be either long or short, but probably should
be short.
2. Add sub-options to limit amount of data generated:
-hprof:depth=n
Limits the max stack depth to n. Default is no limit.
-hprof:top=n
Limits the number of entries in the HPROF_ALLOCS and HPROF_ALLOCS_LIVE
records to the top n (measured in bytes). Default is no limit.
3. Add a new command:
HPROF_CMD_ALLOCS emit a HPROF_ALLOCS record (see below)
*without doing a GC*.
4. Add the following record types:
HPROF_ALLOCS_LIVE a set of heap allocations (as opposed to
HPROF_ALLOC, which is a single allocation)
aggregated by class/thread/stack combination,
plus live data.
[id class name id
id thread id
id stack trace id
u4 number of instances allocated since last
HPROF_ALLOCS or HPROF_ALLOCS_LIVE record
for this class/thread/stack id.
u4 total bytes allocated since last HPROF_ALLOCS
or HPROF_ALLOCS_LIVE record for this
class/thread/stack id.
u4 live instances: total number of instances
that remain (after GC).
u4]* live bytes: total number of bytes
that remain (after GC).
HPROF_ALLOCS a set of heap allocations (as opposed to
HPROF_ALLOC, which is a single allocation)
aggregated by class/thread/stack combination
[id class name id
id thread id
id stack trace id
u4 number of instances allocated since last
HPROF_ALLOCS or HPROF_ALLOCS_LIVE record
for this class/thread/stack id.
u4]* total bytes allocated since last HPROF_ALLOCS
or HPROF_ALLOCS_LIVE record for this
class/thread/stack id.
HPROF_HEAP_SIZE size of heap changed
u4 new heap size
HPROF_HEADER one-time header info, always the first record
UTF8 command line string (progname + args)
UTF8 os.name
UTF8 os.arch
UTF8 os.version
UTF8 CLASSPATH value
u4 initial heap size (-ms)
u4 max heap size (-mx)
HPROF_GC garbage collection info
u4 time to perform GC in microseconds.
(As much as possible, includes only actual
GC time, not overhead such as writing these
records)
u4 total allocated instances before GC
u4 total allocated instances after GC
u4 total allocated bytes before GC
u4 total allocated bytes after GC
HPROF_TRACE_FRAMES a Java stack trace of frame ids
id stack trace ID
[id]* frame ID
HPROF_FRAME a Java stack frame
id frame ID
id class name ID
id method name ID
id method signature ID
i4 line number. >0: normal
-1: unknown
-2: compiled method
-3: native method
(aligned on id boundary)
These records would behave as follows:
- One HPROF_HEADER record is always emitted at the beginning,
regardless of format setting.
- When the heap size changes emit HPROF_HEAP_SIZE record,
regardless of format setting.
- One HPROF_GC record is emitted on each GC, regardless of format
setting.
- HPROF_TRACE_FRAMES can be used in place of HPROF_TRACE, to reduce
the size of the data.
- If format=short:
A HPROF_ALLOCS_LIVE record is emitted instead of HPROF_ALLOC and
HPROF_FREE records, and only on GC.
This would be similar to the type and amount of data previously
emitted by the ASCII version, but it would be using the new style
records and binary format.
More precisely, on every GC event it would emit, in this order
*after* the GC has ocurred:
- one HPROF_ALLOCS_LIVE record containing a list of
each allocation (aggregated by class/thread/stack
combination), since the last HPROF_ALLOCS or HPROF_ALLOCS_LIVE
record. If no allocations have been made since the last
HPROF_ALLOCS or HPROF_ALLOCS_LIVE record, a "zero" record
is still emitted.
- one HPROF_GC record giving time to perform GC and totals.
Upon receiving a HPROF_CMD_ALLOCS command, it would emit:
- one HPROF_ALLOCS record containing a list of
each allocation and free (aggregated by class/thread/stack
combination) since the last HPROF_ALLOCS or HPROF_ALLOCS_LIVE
record. If no allocations have been made since the last
HPROF_ALLOCS or HPROF_ALLOCS_LIVE record, a "zero" record is
still emitted.
- if format=long, HPROF_ALLOC and HPROF_FREE records are emitted on
every allocation (current behavior). A HPROF_GC record is emitted
on every GC.
Achut
Suggestions on improving hprof support in JDK 1.2. See attachments for details.
Return-Path: <achut@nairobi>
Received: from kindra.eng.sun.com by taller.eng.sun.com (SMI-8.6/SMI-SVR4)
id SAA04266; Thu, 11 Dec 1997 18:40:50 -0800
Received: from nairobi.eng.sun.com by kindra.eng.sun.com (SMI-8.6/SMI-SVR4)
id SAA15161; Thu, 11 Dec 1997 18:40:46 -0800
Received: from nairobi.eng.sun.com by nairobi.eng.sun.com (SMI-8.6/SMI-SVR4)
id SAA09934; Thu, 11 Dec 1997 18:40:41 -0800
Date: Thu, 11 Dec 1997 18:40:41 -0800 (PST)
From: Achut Reddy <achut@nairobi>
Reply-To: Achut Reddy <achut@nairobi>
Subject: Proposal for Memory Profiling format changes
To: sheng.liang@nairobi, tom.rodriguez@nairobi, anand.palaniswamy@nairobi
Cc: gwhite@kindra
Message-ID: <libSDtMail.9712111840.8129.achut@nairobi/nairobi>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Content-MD5: 618IcVG494uuiDzivMQ11g==
X-Mailer: dtmail 1.1.0 CDE Version 1.1_50 SunOS 5.5.1 sun4m sparc
Content-Length: 6407
Status: RO
X-Status: $$$$
X-UID: 0000002352
Sheng,
We have been working on a version of the memory profiler that uses the
binary format hprof data. One thing we noticed right away is that
volume of data is much too high for us to be able to handle well. For
anything beyond simple toy programs, the amount of data was *huge* (for
one app we tried, over 1 GB of data was generated).
This amount of data means we cannot even keep up with reading the data
(a pure reader in Java which simply read the data and threw it away
would barely keep up; even then, probably only on a multi-cpu machine,
or two separate machines). Even if we could read it fast enough, there
is no way we could store 1 GB or more of data in memory. Even if the
VM and OS allowed it (which it probably won't), the performance of our
tool would be so slow as to be useless.
Also, it seems a waste to deal with so much data which our current tool
does not even make use of in its displays. We considered "reducing"
the data on the fly, but after some analysis we decided this would only
help a little bit. In order to make a usable tool, we need a drastic
reduction in the amount of data being sent by the VM. Below is our
specific proposal. It is on the order of the amount of data that we
got with the ASCII version. It also adds some additional info needed
by our tool.
Please review this and let us know what you think.
Proposal for extending -hprof option:
------------------------------------
1. We keep everything we have now, but add a new "short format"
sub-option to -hprof:
-hprof:format=short
-hprof:format=long
If format=long, everything behaves mostly as it does now (with some
minor additions, see below).
If format=short, then emit new record types (described below),
and only on GC events, and on HPROF_CMD_ALLOCS.
The default could be either long or short, but probably should
be short.
2. Add sub-options to limit amount of data generated:
-hprof:depth=n
Limits the max stack depth to n. Default is no limit.
-hprof:top=n
Limits the number of entries in the HPROF_ALLOCS and HPROF_ALLOCS_LIVE
records to the top n (measured in bytes). Default is no limit.
3. Add a new command:
HPROF_CMD_ALLOCS emit a HPROF_ALLOCS record (see below)
*without doing a GC*.
4. Add the following record types:
HPROF_ALLOCS_LIVE a set of heap allocations (as opposed to
HPROF_ALLOC, which is a single allocation)
aggregated by class/thread/stack combination,
plus live data.
[id class name id
id thread id
id stack trace id
u4 number of instances allocated since last
HPROF_ALLOCS or HPROF_ALLOCS_LIVE record
for this class/thread/stack id.
u4 total bytes allocated since last HPROF_ALLOCS
or HPROF_ALLOCS_LIVE record for this
class/thread/stack id.
u4 live instances: total number of instances
that remain (after GC).
u4]* live bytes: total number of bytes
that remain (after GC).
HPROF_ALLOCS a set of heap allocations (as opposed to
HPROF_ALLOC, which is a single allocation)
aggregated by class/thread/stack combination
[id class name id
id thread id
id stack trace id
u4 number of instances allocated since last
HPROF_ALLOCS or HPROF_ALLOCS_LIVE record
for this class/thread/stack id.
u4]* total bytes allocated since last HPROF_ALLOCS
or HPROF_ALLOCS_LIVE record for this
class/thread/stack id.
HPROF_HEAP_SIZE size of heap changed
u4 new heap size
HPROF_HEADER one-time header info, always the first record
UTF8 command line string (progname + args)
UTF8 os.name
UTF8 os.arch
UTF8 os.version
UTF8 CLASSPATH value
u4 initial heap size (-ms)
u4 max heap size (-mx)
HPROF_GC garbage collection info
u4 time to perform GC in microseconds.
(As much as possible, includes only actual
GC time, not overhead such as writing these
records)
u4 total allocated instances before GC
u4 total allocated instances after GC
u4 total allocated bytes before GC
u4 total allocated bytes after GC
HPROF_TRACE_FRAMES a Java stack trace of frame ids
id stack trace ID
[id]* frame ID
HPROF_FRAME a Java stack frame
id frame ID
id class name ID
id method name ID
id method signature ID
i4 line number. >0: normal
-1: unknown
-2: compiled method
-3: native method
(aligned on id boundary)
These records would behave as follows:
- One HPROF_HEADER record is always emitted at the beginning,
regardless of format setting.
- When the heap size changes emit HPROF_HEAP_SIZE record,
regardless of format setting.
- One HPROF_GC record is emitted on each GC, regardless of format
setting.
- HPROF_TRACE_FRAMES can be used in place of HPROF_TRACE, to reduce
the size of the data.
- If format=short:
A HPROF_ALLOCS_LIVE record is emitted instead of HPROF_ALLOC and
HPROF_FREE records, and only on GC.
This would be similar to the type and amount of data previously
emitted by the ASCII version, but it would be using the new style
records and binary format.
More precisely, on every GC event it would emit, in this order
*after* the GC has ocurred:
- one HPROF_ALLOCS_LIVE record containing a list of
each allocation (aggregated by class/thread/stack
combination), since the last HPROF_ALLOCS or HPROF_ALLOCS_LIVE
record. If no allocations have been made since the last
HPROF_ALLOCS or HPROF_ALLOCS_LIVE record, a "zero" record
is still emitted.
- one HPROF_GC record giving time to perform GC and totals.
Upon receiving a HPROF_CMD_ALLOCS command, it would emit:
- one HPROF_ALLOCS record containing a list of
each allocation and free (aggregated by class/thread/stack
combination) since the last HPROF_ALLOCS or HPROF_ALLOCS_LIVE
record. If no allocations have been made since the last
HPROF_ALLOCS or HPROF_ALLOCS_LIVE record, a "zero" record is
still emitted.
- if format=long, HPROF_ALLOC and HPROF_FREE records are emitted on
every allocation (current behavior). A HPROF_GC record is emitted
on every GC.
Achut
Suggestions on improving hprof support in JDK 1.2. See attachments for details.
- relates to
-
JDK-6239647 HPROF: Provide filters to create less output
-
- Closed
-
-
JDK-6239651 New binary heap dump format (to replace hprof format=b)
-
- Closed
-