-
Enhancement
-
Resolution: Fixed
-
P4
-
14
-
b05
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8272075 | 11.0.13 | Christoph Langer | P4 | Resolved | Fixed | b02 |
When writing a heap dump, the current implementation needs to write to a file (or another seek-able destination), since it needs to fix up the size of the heap dump segments long after the header of this segment was written.
This makes the following use cases impossible to implement:
- Adding the option to stream the heap dump via a socket. Especially in cloud environments, the local disk space is rather small and might not be enough to store the file.
- Adding the option to write a compressed (e.g. gzipped) version of the heap dump directly. With tools like MAT adding support to directly handle gzipped hprof dumps and the usually good compressibility of a heap dump, this is an attractive use case.
I propose to change this by doing the fix up of the dump segment lengths in the buffer. Every entry in the heap dump segment would know the size it uses. If starting the entry would overflow the buffer, the current used buffer size is used to fix up the heap dump segment header still in the buffer and the buffer is then flushed. If the entry is too large to fit into the buffer at all, we create a heap dump segment containing only this entry, so we don't have to fix up the segment size later. Otherwise we just start a new heap dump segment at the start of the buffer.
This will lead to more heap dump segments to be created than before. But since a heap dump segment header only has a size of 9 bytes, we would only have at most 18 bytes per buffer size additional overhead. For suitable buffer sizes this is negligible. On the other hand, more heap dump segments makes it easier to implement parallel parsing in heap dump parsers (it's easy to parse the different heap dump segments in parallel, but hard to parse a single heap dump segment in parallel).
Additionally the code must now be able to allocate a buffer of at least at few K. The old code could work without a buffer or a buffer of 1 byte (at least in theory, the overhead of writing single bytes via the I/O calls is high). But the lifetime of a VM in which the allocation of a few K bytes fails is short. So this is usually only a theoretical problem.
This makes the following use cases impossible to implement:
- Adding the option to stream the heap dump via a socket. Especially in cloud environments, the local disk space is rather small and might not be enough to store the file.
- Adding the option to write a compressed (e.g. gzipped) version of the heap dump directly. With tools like MAT adding support to directly handle gzipped hprof dumps and the usually good compressibility of a heap dump, this is an attractive use case.
I propose to change this by doing the fix up of the dump segment lengths in the buffer. Every entry in the heap dump segment would know the size it uses. If starting the entry would overflow the buffer, the current used buffer size is used to fix up the heap dump segment header still in the buffer and the buffer is then flushed. If the entry is too large to fit into the buffer at all, we create a heap dump segment containing only this entry, so we don't have to fix up the segment size later. Otherwise we just start a new heap dump segment at the start of the buffer.
This will lead to more heap dump segments to be created than before. But since a heap dump segment header only has a size of 9 bytes, we would only have at most 18 bytes per buffer size additional overhead. For suitable buffer sizes this is negligible. On the other hand, more heap dump segments makes it easier to implement parallel parsing in heap dump parsers (it's easy to parse the different heap dump segments in parallel, but hard to parse a single heap dump segment in parallel).
Additionally the code must now be able to allocate a buffer of at least at few K. The old code could work without a buffer or a buffer of 1 byte (at least in theory, the overhead of writing single bytes via the I/O calls is high). But the lifetime of a VM in which the allocation of a few K bytes fails is short. So this is usually only a theoretical problem.
- backported by
-
JDK-8272075 Remove file seeking requirement for writing a heap dump
- Resolved