Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8269685

Optimize HeapHprofBinWriter implementation



    • Enhancement
    • Status: Resolved
    • P4
    • Resolution: Fixed
    • None
    • 18
    • hotspot
    • b17


      As described in https://bugs.openjdk.java.net/browse/JDK-8262386, the current implementation of HeapHprofBinWriter uses SegmentedOutputStream to dump heap in segmented mode.

      It is required because of the correctness, specifically for compressed heap dump. The current implementation of heap dump re-writes the `size` slot of segment header, which may already been written to the underlying file. For gzipped heap dump, rewriting the data that have been written to a gzip file directly could cause data pollution because the position is unknown. Therefore it has to cache all data, and write them to file only when the `size` slot is updated.

      However, this implementation causes memory overhead because it has to cache the whole segment data before writing to file. And also it introduces complicated logic for array dump, which has been discussed in JDK-8262386 and its PR at https://github.com/openjdk/jdk/pull/2803.

      After deep investigation, we found the HeapHprofBinWriter can be refined and the whole logic of SegmentedOutputStream could be removed. Therefore the following changes are proposed:

      - For segmented heap dump, calculate the data size ahead of time before starting dump data. So the size of data to be written is known when the header is created. And hence there is no need to re-write the `size` slot of the header.

      - Remove the SegmentedOutputStream, because there is no need to re-write the header, there is no need to cache the data. So the whole SegmentedOutputStream is unnecessary.

      With this refine, the logic is simpler:
      - For an object/array in heap, the dumper first calculates the size of the data to be written, composes the header section with the correct size, writes the header data to the underlying outputstream, and then iterates the object/array data and writes them to underlying output stream.

      The risk of this implementation is that it needs to add logic of size calculation before iterating the object/array. And we believe the risk is low because most of the code looks similar to object iteration, and it can also refer to the jcmd jmap dump code for implementation.


        Issue Links



              lzang Lin Zang
              lzang Lin Zang
              0 Vote for this issue
              3 Start watching this issue