Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8351630

Fix NUMA association for the duration of a single G1 Heap allocation

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 17, 21, 25
    • hotspot
    • gc

      There is a bug (JDK-8351500) which manifests as JVM crashes and assertions in JDK 21 and JDK 17. In JDK 22 and later it does not cause any crashes. Here, the error is benign.

      My first attempt was to fix the bug for JDK 21 (and later backport that fix to 17). That attempt is https://github.com/openjdk/jdk21u-dev/pull/1460.

      But it may be better to fix the lingering root cause of the issue in the mainline JDK, because otherwise we may run into it again later. Also, backports to JDK21 in that area may accidentally break the fix again.

      The gist of the problem is that an OS task may be moved by the scheduler to a different NUMA node at any point in time, including during execution of `G1CollectedHeap::attempt_allocation`. In the course of executing `G1CollectedHeap::attempt_allocation`, we will call into the `G1Allocator` several times, and each time the allocator uses the `G1AllocRegion` belonging to the NUMA node of the current thread. However, the NUMA node association can change, so it may be that we use different `G1AllocRegion`. This causes problems because the control flow implicitly assumes that we always use the same alloc region.

      For further details, please see JDK-8351500.

            stuefe Thomas Stuefe
            stuefe Thomas Stuefe
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: