Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8341334

CDS: Parallel relocation

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P4 P4
    • 24
    • 17, 21, 24
    • hotspot
    • master

      In Leyden performance investigations, we have figured that ArchiveRelocationMode=0 is saving 5..7 ms on HelloWorld startup. Mainline defaults to ARM=1, _losing_ as much. ARM=0 was switched to ARM=1 with JDK-8294323, which was delivered to JDK 17+ in in Apr 2023.

      Profiling shows we spend time mem-faulting the memory loading the RO/RW regions, about 15 MB total. 15 MB in 5ms amounts to >4GB/sec, close to the single-threaded limits. I suspect the impact is larger if we relocate larger Metaspace, e.g. after dumping a CDS archive from a large application.

      There is little we can do to make the actual relocation part faster: the overwhelming majority of samples is on kernel side. There are two opportunities here: a) parallel pretouch for mmap-ed regions, which wires up memory upfront, handling memfaults in parallel; b) parallel relocation, which balances the storm of memory writes for relocations themselves.

      My early experiments show we can recover almost all the cost with just a handful of threads: we can manage to cut 5..7ms to just 0.5 ms with only pretouch, and we go under the noise floor with both pretouch and parallel relocation.

      Draft: https://github.com/openjdk/jdk/compare/master...shipilev:jdk:JDK-8341334-cds-parallel-relocation

            shade Aleksey Shipilev
            shade Aleksey Shipilev
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: