-
Bug
-
Resolution: Unresolved
-
P4
-
None
-
20
-
x86_64
-
linux_ubuntu
ADDITIONAL SYSTEM INFORMATION :
32 cores, 64GB server Ubuntu 22.04.2 LTS
OpenJDK 64-Bit Server VM (20+36) for linux-amd64 JRE (20+36), built on 2023-03-21T00:00:00Z by "temurin" with gcc 11.2.0
-XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=static -XX:ShenandoahMinFreeThreshold=40 -XX:ShenandoahPacingMaxDelay=20 -XX:ConcGCThreads=12 -XX:CICompilerCount=4 -XX:+UseLargePages -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch
A DESCRIPTION OF THE PROBLEM :
we distribute load between 2 instances with the same configuration.
Recently, we observed that the CPU usage on one instance started to gradually increase(up to 60% when expected load is about 10%).
We checked business and hardware metrics and they were the same on two instances(except CPU).
We captured jfr and cpu/memory flamegraph and it showed CPU was consumed by access to threadlocal from logback. We put in MDC ~3-5 parameters for every request. This configuration has been running smoothly for over a year, and this is the first occurrence of the issue.
Also We observed that the load decreased when we attempted to profile the application.
java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(int)
java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal)
java.lang.ThreadLocal.remove(Thread)
java.lang.ThreadLocal.remove()
ch.qos.logback.classic.util.LogbackMDCAdapter.clear()
org.slf4j.MDC.clear()
Do you have any insights into why this might be happening? Is there any useful information from the JFR that I can share? Additionally, do you have any recommendations for what steps to take if this issue occurs again?
FREQUENCY : occasionally
32 cores, 64GB server Ubuntu 22.04.2 LTS
OpenJDK 64-Bit Server VM (20+36) for linux-amd64 JRE (20+36), built on 2023-03-21T00:00:00Z by "temurin" with gcc 11.2.0
-XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=static -XX:ShenandoahMinFreeThreshold=40 -XX:ShenandoahPacingMaxDelay=20 -XX:ConcGCThreads=12 -XX:CICompilerCount=4 -XX:+UseLargePages -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch
A DESCRIPTION OF THE PROBLEM :
we distribute load between 2 instances with the same configuration.
Recently, we observed that the CPU usage on one instance started to gradually increase(up to 60% when expected load is about 10%).
We checked business and hardware metrics and they were the same on two instances(except CPU).
We captured jfr and cpu/memory flamegraph and it showed CPU was consumed by access to threadlocal from logback. We put in MDC ~3-5 parameters for every request. This configuration has been running smoothly for over a year, and this is the first occurrence of the issue.
Also We observed that the load decreased when we attempted to profile the application.
java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(int)
java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal)
java.lang.ThreadLocal.remove(Thread)
java.lang.ThreadLocal.remove()
ch.qos.logback.classic.util.LogbackMDCAdapter.clear()
org.slf4j.MDC.clear()
Do you have any insights into why this might be happening? Is there any useful information from the JFR that I can share? Additionally, do you have any recommendations for what steps to take if this issue occurs again?
FREQUENCY : occasionally