-
Bug
-
Resolution: Unresolved
-
P3
-
None
-
17.0.10
-
Encountered with JDK 17.0.8 and 17.0.10
OS Info
Red Hat Enterprise Linux 8.9 (Ootpa)
Host: AMD EPYC 7542 32-Core Processor, 128 cores, 503G, Red Hat Enterprise Linux release 8.9 (Ootpa)
Kernel: Linux 4.18.0-513.9.1.el8_9.x86_64 #1 SMP Thu Nov 16 10:29:04 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
Architecture: x86_64
Processors: 128 CPUEncountered with JDK 17.0.8 and 17.0.10 OS Info Red Hat Enterprise Linux 8.9 (Ootpa) Host: AMD EPYC 7542 32-Core Processor, 128 cores, 503G, Red Hat Enterprise Linux release 8.9 (Ootpa) Kernel: Linux 4.18.0-513.9.1.el8_9.x86_64 #1 SMP Thu Nov 16 10:29:04 EST 2023 x86_64 x86_64 x86_64 GNU/Linux Architecture: x86_64 Processors: 128 CPU
-
x86_64
-
linux_redhat_8.0
This seems like a related but slightly different case than fixed in JDK-8308766
We see a crash with SIGFPE at ThreadLocalAllocBuffer::initial_desired_size()
It is reproducible only on a specific set of machines and is not visible anywhere else with the same application. Not sure what is necessary to reproduce it elsewhere.
One possible candidate for a SIGFPE in the code is
init_sz = (Universe::heap()->tlab_capacity(thread()) / HeapWordSize) /
(nof_threads * target_refills());
at https://github.com/openjdk/jdk17u/blob/jdk-17.0.10-ga/src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp#L280
HeapWordSize seems to be a constant, but maybe either nof_threads or target_refills() can be zero in some cases?
bits from hs_err_pid:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGFPE (0x8) at pc=0x00007efeed9b5b9c, pid=3048299, tid=3050463
#
# JRE version: (17.0.10+7) (build )
# Java VM: OpenJDK 64-Bit Server VM (17.0.10+7, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0xe68b9c] ThreadLocalAllocBuffer::initial_desired_size()+0x10c
Stack: [0x00007efd9c21e000,0x00007efd9ca1e000], sp=0x00007efd9ca1cd20, free space=8187k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xe68b9c] ThreadLocalAllocBuffer::initial_desired_size()+0x10c
V [libjvm.so+0xe68be4] ThreadLocalAllocBuffer::initialize()+0x24
V [libjvm.so+0x8bfec4] attach_current_thread.part.0+0x94
V [libjvm.so+0x8c023d] jni_AttachCurrentThread+0x6d
C 0x00007efd9cb5b701
C 0x00007efd9cb5ba4e
Potential workarounds:
* Disable TLAB with -XX:-UseTLAB - may have large performance impact
* Configure an initial"TLABSize" via JVM parameters -XX:TLABSize=... to try to avoid code-branch which crashes (https://github.com/openjdk/jdk17u/blob/jdk-17.0.10-ga/src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp#L273) - e.g. -XX:TLABSize=2k (must be between 1k and 512k), seems the JDK will only use this as "initial" size and resize properly afterwards, see https://answers.ycrash.io/question/what-is-jvm-startup-parameter--xxtlabsize?q=833
"chatty" logging for tlab-size can be enabled via -Xlog:tlab*=debug,tlab*=trace:file=gc.log:time:filecount=7,filesize=8M (edited)
We see a crash with SIGFPE at ThreadLocalAllocBuffer::initial_desired_size()
It is reproducible only on a specific set of machines and is not visible anywhere else with the same application. Not sure what is necessary to reproduce it elsewhere.
One possible candidate for a SIGFPE in the code is
init_sz = (Universe::heap()->tlab_capacity(thread()) / HeapWordSize) /
(nof_threads * target_refills());
at https://github.com/openjdk/jdk17u/blob/jdk-17.0.10-ga/src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp#L280
HeapWordSize seems to be a constant, but maybe either nof_threads or target_refills() can be zero in some cases?
bits from hs_err_pid:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGFPE (0x8) at pc=0x00007efeed9b5b9c, pid=3048299, tid=3050463
#
# JRE version: (17.0.10+7) (build )
# Java VM: OpenJDK 64-Bit Server VM (17.0.10+7, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0xe68b9c] ThreadLocalAllocBuffer::initial_desired_size()+0x10c
Stack: [0x00007efd9c21e000,0x00007efd9ca1e000], sp=0x00007efd9ca1cd20, free space=8187k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xe68b9c] ThreadLocalAllocBuffer::initial_desired_size()+0x10c
V [libjvm.so+0xe68be4] ThreadLocalAllocBuffer::initialize()+0x24
V [libjvm.so+0x8bfec4] attach_current_thread.part.0+0x94
V [libjvm.so+0x8c023d] jni_AttachCurrentThread+0x6d
C 0x00007efd9cb5b701
C 0x00007efd9cb5ba4e
Potential workarounds:
* Disable TLAB with -XX:-UseTLAB - may have large performance impact
* Configure an initial"TLABSize" via JVM parameters -XX:TLABSize=... to try to avoid code-branch which crashes (https://github.com/openjdk/jdk17u/blob/jdk-17.0.10-ga/src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp#L273) - e.g. -XX:TLABSize=2k (must be between 1k and 512k), seems the JDK will only use this as "initial" size and resize properly afterwards, see https://answers.ycrash.io/question/what-is-jvm-startup-parameter--xxtlabsize?q=833
"chatty" logging for tlab-size can be enabled via -Xlog:tlab*=debug,tlab*=trace:file=gc.log:time:filecount=7,filesize=8M (edited)
- relates to
-
JDK-8308341 JNI_GetCreatedJavaVMs returns a partially initialized JVM
-
- Resolved
-
-
JDK-8308766 TLAB initialization may cause div by zero
-
- Resolved
-