Loading...

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: 25
Affects Version/s: 17, 21, 25
Component/s: hotspot
Labels:

Subcomponent:
gc
Resolved In Build:
b15
OS:

linux

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8352387	24.0.2	Thomas Stuefe	P3	Resolved	Fixed	b01
JDK-8353102	21.0.8	Thomas Stuefe	P3	Resolved	Fixed	b01

(Note: This bug manifests on JDK 21 and 17; we don't see crashes or asserts on mainline JDK. but I argue that the underlying root issue is also in mainline JDK and would best be fixed there).

One of our customers found that NUMA migrations (more precisely, the OS task getting scheduled to a different NUMA node) can cause G1 to crash if they happen at exactly the wrong moment.

JVM runs with +UseNUMA +UseNUMAInterleaving, G1GC and 4TB heap, two or four NUMA nodes, about 5000 application threads and 159 GC worker threads. JVM crashes (rarely, about once every four hours or so).

Call stacks wildly different, e.g.:

```
    28 Stack: [0x00007e506733f000,0x00007e5067540000], sp=0x00007e506753cf10, free space=2039k
    29 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
    30 V [libjvm.so+0xf32422] Symbol::as_klass_external_name() const+0x12 (symbol.hpp:140)
    31 V [libjvm.so+0xda71ff] SharedRuntime::generate_class_cast_message(Klass*, Klass*, Symbol*)+0x1f (sharedRuntime.cpp:2179)
    32 V [libjvm.so+0xda99c4] SharedRuntime::generate_class_cast_message(JavaThread*, Klass*)+0xd4 (sharedRuntime.cpp:2171)
    33 V [libjvm.so+0x578e2c] Runtime1::throw_class_cast_exception(JavaThread*, oopDesc*)+0x13c (c1_Runtime1.cpp:735)
```

in some crashes, it looks like we load a zero from the heap where no zero should be (eg. as narrow Klass ID from an oop header).

However, if you run a debug JVM, you usually see an assert either in G1Allocator or in CollectedHeap, for example

```
  27 Current thread (0x00007fb770087b70): JavaThread "Thread-33" [_thread_in_vm, id=123345, stack(0x00007fb7a86d7000,0x00007fb7a87d8000) (1028K)]
  28
  29 Stack: [0x00007fb7a86d7000,0x00007fb7a87d8000], sp=0x00007fb7a87d62f0, free space=1020k
  30 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
  31 V [libjvm.so+0x9fdd6b] CollectedHeap::fill_with_object_impl(HeapWordImpl**, unsigned long, bool) [clone .part.0]+0x2b (collectedHeap.cpp:470)
  32 V [libjvm.so+0x9fff1d] CollectedHeap::fill_with_object(HeapWordImpl**, unsigned long, bool)+0x39d (arrayOop.hpp:58)
  33 V [libjvm.so+0xc5009f] G1AllocRegion::fill_up_remaining_space(HeapRegion*)+0x1ef (g1AllocRegion.cpp:79)
  34 V [libjvm.so+0xc5027c] G1AllocRegion::retire_internal(HeapRegion*, bool)+0x6c (g1AllocRegion.cpp:106)
  35 V [libjvm.so+0xc51347] MutatorAllocRegion::retire(bool)+0xb7 (g1AllocRegion.cpp:300)
  36 V [libjvm.so+0xc50ed9] G1AllocRegion::new_alloc_region_and_allocate(unsigned long, bool)+0x59 (g1AllocRegion.cpp:139)
  37 V [libjvm.so+0xc9b140] G1CollectedHeap::attempt_allocation_slow(unsigned long)+0x6d0 (g1AllocRegion.inline.hpp:120)
  38 V [libjvm.so+0xc9e4ff] G1CollectedHeap::attempt_allocation(unsigned long, unsigned long, unsigned long*)+0x39f (g1CollectedHeap.cpp:643)
  39 V [libjvm.so+0xc9bd4f] G1CollectedHeap::mem_allocate(unsigned long, bool*)+0x5f (g1CollectedHeap.cpp:401)
  40 V [libjvm.so+0x13b9b6d] MemAllocator::mem_allocate_slow(MemAllocator::Allocation&) const+0x5d (memAllocator.cpp:240)
  41 V [libjvm.so+0x13b9ca1] MemAllocator::allocate() const+0xa1 (memAllocator.cpp:357)
```

The problem is in `G1Allocator`. `G1AllocRegion` objects tied to NUMA nodes. For most actions involving the `G1Allocator`, we determine the `G1AllocRegion` of the current thread, then redirect the action toward that alloc region. However, due to OS scheduling the NUMA-to-thread-association can change arbitrarily. That means calls to `G1Allocator` are not guaranteed to hit the same `G1AllocRegion` object as last time.

Now, we have control flows that assume that we work with the same `G1AllocRegion` object over their duration, since we build up state in `G1AllocRegion`. The JDK 21 control flow affected is:

```
- `G1CollectedHeap::attempt_allocation_slow`
  - `G1Allocator::attempt_allocation_locked` (A)
    - `G1AllocRegion::attempt_allocation_locked`
      - `G1AllocRegion::attempt_allocation` (try again allocating from HeapRegion under lock protection); failing that:
      - `G1AllocRegion::attempt_allocation_using_new_region`
        - `G1AllocRegion::retire` (retires current allocation region; may keep it as retained region)
        - `G1AllocRegion::new_alloc_region_and_allocate` (allocate new HeapRegion and set it; failing that, sets dummy region), failing that:
  - `G1Allocator::attempt_allocation_force` (B)
    - `G1AllocRegion::attempt_allocation_force`
      - `G1AllocRegion::new_alloc_region_and_allocate`
```

Here, if we change NUMA node from (A) to (B), we will address different `G1AllocRegion` objects. But `G1AllocRegion::attempt_allocation_force` assumes that the current allocation region for this object is retired, which is done by the preceding `G1AllocRegion::attempt_allocation_locked`, but for a different region.

This causes us to abandon the current allocation region; it won't be added to the collection set. On debug JVMs, we hit one of two asserts. We either complain about the current allocation region being not dummy at the entrance of new_alloc_region_and_allocate; In JDK 17, we assert when retire the wrong region, and it is more empty than expected. The effect of this can be delayed, happening on the next retire, since it can affect the retained region.

----

Reproduction and Regression testing

Reproducing the bug is difficult. I did not have a NUMA machine, and even if I had one, NUMA task-node migrations are very rare. Therefore, I build something like a "FakeNUMA" mode which essentially interposes OS NUMA calls and fakes a NUMA system of 8 nodes. I also added a "FakeNUMAStressMigrations" mode mimicking frequent node migrations. With these simple tools, I could reproduce the customer problem (with gc/TestJNICriticalStressTest, slightly modified to increase the number of JNICritical threads). I plan to bring the FakeNUMA mode upstream, but have no time atm to polish it up.

backported by

JDK-8352387 G1: NUMA migrations cause crashes in region allocation

Resolved

JDK-8353102 G1: NUMA migrations cause crashes in region allocation

Resolved

duplicates

JDK-8350490 G1: Memory corruption during attempt_allocation_slow execution path

Closed

JDK-8351630 Fix NUMA association for the duration of a single G1 Heap allocation

Closed

relates to

JDK-8351649 Parallel: NUMA migrations crash the VM

Closed

JDK-8351526 G1: UseNUMA may cause the JVM not to start if heap is too small

In Progress

JDK-8351770 Add a "FakeNUMA" mode to fake NUMA support and stress NUMA task migrations

Open

links to

Commit(master) openjdk/jdk21u-dev/c5c0ac61

Commit(master) openjdk/jdk24u/36765ad3

Commit(master) openjdk/jdk/37ec7962

Review(master) openjdk/jdk17u-dev/3607

Review(master) openjdk/jdk21u-dev/1488

Review(master) openjdk/jdk21u/461

Review(master) openjdk/jdk24u/138

Review(master) openjdk/jdk/23984

(2 relates to, 8 links to)

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates