While investigating the best solution for JDK-8351500, I found reproduction difficult, since I have no NUMA machine available. Moreover, the bug was caused by a NUMA node migration happening at exactly the wrong time, and to see these errors one needs a lot of luck and patience.
So I wrote a primitive "FakeNUMA" mode that sits between libnuma and the JVM and mimics a NUMA system. I then added a "FakeNUMAStressMigrations" mode that mimics tons of NUMA migrations (essentially, randomizing the thread-to-node associations).
With such a mode I could reproduce the customer problem behindJDK-8351500 on my machine, and test the various patch variants. I also saw problems on ParallelGC in this mode.
The patch was quick and dirty, but I think it may be useful to have a mode like this in the JVM for regression testing and hardening our NUMA code.
So I wrote a primitive "FakeNUMA" mode that sits between libnuma and the JVM and mimics a NUMA system. I then added a "FakeNUMAStressMigrations" mode that mimics tons of NUMA migrations (essentially, randomizing the thread-to-node associations).
With such a mode I could reproduce the customer problem behind
The patch was quick and dirty, but I think it may be useful to have a mode like this in the JVM for regression testing and hardening our NUMA code.
- relates to
-
JDK-8351500 G1: NUMA migrations cause crashes in region allocation
-
- Resolved
-