Observed on windows x64 >1 NUMA nodes configured.
VM started with -XX:+UseNUMAInterleaving hangs. Endlessly prints "NUMA page allocation failed". Virtual memory size balloons up into the TB range. Working set size slowly grows. VM needs to be stopped forcefully.
On that particular machine this was reproducable with a simple
java -XX:+UseNUMA -XX:+UseNUMAInterleaving -version
This bug started happening with https://bugs.openjdk.java.net/browse/JDK-8251158 ("Implementation of JEP 387: Elastic Metaspace").
Analysis shows that we hang during initialization of Metaspace/CDS in os_windows.cpp, map_or_reserve_memory_aligned() the loop starting at os_windows.cpp:3152.
This function attempts to reserve an aligned region. This involves:
1 reservation of a larger region anywhere (no wish pointer) to take alignment into account
2 releasing that region
3 re-reserving at the aligned starting address in the hope that this region is free.
Note the difference to POSIX platforms, where we use mmap and can just unmap the unaligned begin and end of the region. Since on Windows mappings are undivisible, this is not possible, hence the release-and-hope-loop.
Current (still unproven) hypothesis is:
1) We reserve memory in an interleaved fashion. This involves multiple VirtualAlloc calls. This causes the resulting mapping to be a patchwork of multiple mappings.
2) We attempt to release that mapping using os::release_memory(). But that only releases the first mapping in this patchwork area and leaves the other mappings intact.
3) We attempt to map into the aligned address and that fails.
4) We repeat the loop. The unreleased virtual memory segments accumulate and cause virtual size to balloon.
I currently believe this is not caused by JEP387, but with JEP387 allocation patterns change. For instance, we now allocate with larger alignments.
Analysis is ongoing.
VM started with -XX:+UseNUMAInterleaving hangs. Endlessly prints "NUMA page allocation failed". Virtual memory size balloons up into the TB range. Working set size slowly grows. VM needs to be stopped forcefully.
On that particular machine this was reproducable with a simple
java -XX:+UseNUMA -XX:+UseNUMAInterleaving -version
This bug started happening with https://bugs.openjdk.java.net/browse/JDK-8251158 ("Implementation of JEP 387: Elastic Metaspace").
Analysis shows that we hang during initialization of Metaspace/CDS in os_windows.cpp, map_or_reserve_memory_aligned() the loop starting at os_windows.cpp:3152.
This function attempts to reserve an aligned region. This involves:
1 reservation of a larger region anywhere (no wish pointer) to take alignment into account
2 releasing that region
3 re-reserving at the aligned starting address in the hope that this region is free.
Note the difference to POSIX platforms, where we use mmap and can just unmap the unaligned begin and end of the region. Since on Windows mappings are undivisible, this is not possible, hence the release-and-hope-loop.
Current (still unproven) hypothesis is:
1) We reserve memory in an interleaved fashion. This involves multiple VirtualAlloc calls. This causes the resulting mapping to be a patchwork of multiple mappings.
2) We attempt to release that mapping using os::release_memory(). But that only releases the first mapping in this patchwork area and leaves the other mappings intact.
3) We attempt to map into the aligned address and that fails.
4) We repeat the loop. The unreleased virtual memory segments accumulate and cause virtual size to balloon.
I currently believe this is not caused by JEP387, but with JEP387 allocation patterns change. For instance, we now allocate with larger alignments.
Analysis is ongoing.
- duplicates
-
JDK-8255978 [windows] os::release_memory may not release the full range
- Closed
- relates to
-
JDK-8256287 [windows] add loop fuse to map_or_reserve_memory_aligned
- Resolved
-
JDK-8240654 Windows GDI functions can fail and cause severe UI application repaint issues
- Closed
-
JDK-8255917 runtime/cds/SharedBaseAddress.java failed "assert(reserved_rgn != 0LL) failed: No reserved region"
- Resolved
-
JDK-8251158 Implementation of JEP 387: Elastic Metaspace
- Resolved