Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8315923

pretouch_memory by atomic-add-0 fragments huge pages unexpectedly

    XMLWordPrintable

Details

    • 18
    • b08
    • generic
    • linux

    Description

      Since JDK-8272807 (version 18 b10), the implementation of os::pretouch_memory got updated from volatile-write-0 to each page in a range, to atomic-add-0, which was to "permit use of memory concurrently with pretouch".

      However when we have options -XX:+AlwaysPreTouch -XX:+UseTransparentHugePages together for an app using huge amount of memory like >200GiB. According to logs of numastat -mnv, the transparent huge pages would be fragmented into regular pages unexpectedly. Later on kernel would try to assemble all regular pages (for example 4KB) to make up huge pages (in this case 2MB) gradually. This procedure would cause minutes or longer than half an hour, which impacted the performance on not only the startup phase, but also eventual scores of some key benchmarks especially those highly counting on 99th percentile response-time ones.

      From the viewpoint of kernel, if we use “load” instruction in the first place (such as volatile-add-0, or atomic-add-0), kernel actually doesn’t allocate any page, but use “zero” page (it is a special page with all 0) instead. The following write (store instructions) will trigger COW (copy-on-write). But kernel will allocate small pages instead of huge pages for COW. Later on, the huge pages are installed by the dedicated kernel thread asynchronously. Kernel did allocate huge page for COW prior v5.8, but the behavior was changed due to commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3917c80280c9 (“thp: change CoW semantics for anon-THP”).

      Therefore, the issue behaves differently with various Linux versions, 4.18 is fine while almost all recent ones like 5.8+, 6.x could show same problem. We could not go back to use volatile-write-0, or revert kernels to the last-known-good point. Instead, madvise call with MADV_POPULATE_WRITE (since Linux 5.14) can be the right way of pretouching memory, and it works well with both regular and huge pages.

      Attachments

        Issue Links

          Activity

            People

              qpzhang Patrick Zhang
              qpzhang Patrick Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: