By default, glibc allocates a new 128 MB malloc arena for every thread (up to a certain limit, by default 8 * processor count).
This is good for few threads which perform a lot of concurrent mallocs, but it doesn't fit well to the JVM which has its own memory management and rather allocates fewer and larger chunks.
(See glibc source code libc_malloc which calls arena_get2 in malloc.c and _int_new_arena in arena.c.)
Using only one arena significantly reduces virtual memory footprint. Saving memory seems to be more valuable for the JVM itself than optimizing concurrent mallocs.
Note: The first malloc in each thread triggers a 128MB mmap which typically is the initialization of thread-local storage.
#0 __mmap (addr=addr@entry=0x0, len=len@entry=134217728, prot=prot@entry=0, flags=flags@entry=16418, fd=fd@entry=-1, offset=offset@entry=0) at ../sysdeps/unix/sysv/linux/wordsize-64/mmap.c:33
#1 0x00007ffff72403d1 in new_heap (size=135168, size@entry=2264, top_pad=<optimized out>) at arena.c:438
#2 0x00007ffff7240c21 in _int_new_arena (size=24) at arena.c:646
#3 arena_get2 (size=size@entry=24, avoid_arena=avoid_arena@entry=0x0) at arena.c:879
#4 0x00007ffff724724a in arena_get2 (avoid_arena=0x0, size=24) at malloc.c:2911
#5 __GI___libc_malloc (bytes=24) at malloc.c:2911
#6 0x00007ffff7de9ff8 in allocate_and_init (map=<optimized out>) at dl-tls.c:603
#7 tls_get_addr_tail (ti=0x7ffff713e100, dtv=0x7ffff0038890, the_map=0x6031a0) at dl-tls.c:791
#8 0x00007ffff6b596ac in Thread::initialize_thread_current() () from openjdk10/lib/server/libjvm.so
There are basically 2 issues:
- virtual memory: Gets so much larger than needed. This is an issue for users with reduced ulimit, cloud applications in containers, embedded.
- physical memory: We're not wasting so much, but if the JVM handles all performance critical allocations by its own management, it should be worth saving.
This is good for few threads which perform a lot of concurrent mallocs, but it doesn't fit well to the JVM which has its own memory management and rather allocates fewer and larger chunks.
(See glibc source code libc_malloc which calls arena_get2 in malloc.c and _int_new_arena in arena.c.)
Using only one arena significantly reduces virtual memory footprint. Saving memory seems to be more valuable for the JVM itself than optimizing concurrent mallocs.
Note: The first malloc in each thread triggers a 128MB mmap which typically is the initialization of thread-local storage.
#0 __mmap (addr=addr@entry=0x0, len=len@entry=134217728, prot=prot@entry=0, flags=flags@entry=16418, fd=fd@entry=-1, offset=offset@entry=0) at ../sysdeps/unix/sysv/linux/wordsize-64/mmap.c:33
#1 0x00007ffff72403d1 in new_heap (size=135168, size@entry=2264, top_pad=<optimized out>) at arena.c:438
#2 0x00007ffff7240c21 in _int_new_arena (size=24) at arena.c:646
#3 arena_get2 (size=size@entry=24, avoid_arena=avoid_arena@entry=0x0) at arena.c:879
#4 0x00007ffff724724a in arena_get2 (avoid_arena=0x0, size=24) at malloc.c:2911
#5 __GI___libc_malloc (bytes=24) at malloc.c:2911
#6 0x00007ffff7de9ff8 in allocate_and_init (map=<optimized out>) at dl-tls.c:603
#7 tls_get_addr_tail (ti=0x7ffff713e100, dtv=0x7ffff0038890, the_map=0x6031a0) at dl-tls.c:791
#8 0x00007ffff6b596ac in Thread::initialize_thread_current() () from openjdk10/lib/server/libjvm.so
There are basically 2 issues:
- virtual memory: Gets so much larger than needed. This is an issue for users with reduced ulimit, cloud applications in containers, embedded.
- physical memory: We're not wasting so much, but if the JVM handles all performance critical allocations by its own management, it should be worth saving.
- relates to
-
JDK-8303767 Identify and address the likely causes of glibc allocator fragmentation
- Open
-
JDK-8302264 Improve dynamic compiler threads creation
- Open