Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: P2
Fix Version/s: 16
Affects Version/s: 16
Component/s: hotspot
Labels:
- metaspace

Subcomponent:
gc
OS:

windows

Observed on windows x64 >1 NUMA nodes configured.

VM started with -XX:+UseNUMAInterleaving hangs. Endlessly prints "NUMA page allocation failed". Virtual memory size balloons up into the TB range. Working set size slowly grows. VM needs to be stopped forcefully.

On that particular machine this was reproducable with a simple
java -XX:+UseNUMA -XX:+UseNUMAInterleaving -version

This bug started happening with https://bugs.openjdk.java.net/browse/JDK-8251158 ("Implementation of JEP 387: Elastic Metaspace").

Analysis shows that we hang during initialization of Metaspace/CDS in os_windows.cpp, map_or_reserve_memory_aligned() the loop starting at os_windows.cpp:3152.

This function attempts to reserve an aligned region. This involves:
1 reservation of a larger region anywhere (no wish pointer) to take alignment into account
2 releasing that region
3 re-reserving at the aligned starting address in the hope that this region is free.

Note the difference to POSIX platforms, where we use mmap and can just unmap the unaligned begin and end of the region. Since on Windows mappings are undivisible, this is not possible, hence the release-and-hope-loop.

Current (still unproven) hypothesis is:
1) We reserve memory in an interleaved fashion. This involves multiple VirtualAlloc calls. This causes the resulting mapping to be a patchwork of multiple mappings.
2) We attempt to release that mapping using os::release_memory(). But that only releases the first mapping in this patchwork area and leaves the other mappings intact.
3) We attempt to map into the aligned address and that fails.
4) We repeat the loop. The unreleased virtual memory segments accumulate and cause virtual size to balloon.

I currently believe this is not caused by JEP387, but with JEP387 allocation patterns change. For instance, we now allocate with larger alignments.

Analysis is ongoing.

duplicates

JDK-8255978 [windows] os::release_memory may not release the full range

Closed

relates to

JDK-8256287 [windows] add loop fuse to map_or_reserve_memory_aligned

Resolved

JDK-8240654 Windows GDI functions can fail and cause severe UI application repaint issues

Closed

JDK-8255917 runtime/cds/SharedBaseAddress.java failed "assert(reserved_rgn != 0LL) failed: No reserved region"

Resolved

JDK-8251158 Implementation of JEP 387: Elastic Metaspace

Resolved

Assignee:: Thomas Stuefe

Reporter:: Thomas Stuefe

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2020-11-05 06:11

Updated:: 2024-10-08 14:58

Resolved:: 2020-11-12 04:56

Details

Description

Attachments

Issue Links

Activity

People

Dates