Deadlock in ThrowableTracer

XMLWordPrintable

    • Type: Bug
    • Resolution: Unresolved
    • Priority: P3
    • tbd
    • Affects Version/s: 17, 21
    • Component/s: hotspot
    • None
    • jfr

      We've seen a deadlock that involves ThrowableTracer, and it looks like this:

      There is one thread that is holding the class-lock for jdk.jfr.internal.Utils because it is trying to register an Event class, and is then triggering a ClassNotFoundException (which looks like it might be an expected situation - that CNFE is caught and handled in the call path), which triggers ThrowableTracer, which in turn is calling into Handlers which then wants to init that class and waits to acquire the class-init lock for jdk.jfr.events.Handlers.

      There is another thread, which is also throwing an exception and triggering ThrowableTracer, which is also calling into handlers and successfully acquired the class-init lock for that class, but then waits to acquire the class-lock for Utils, which the other thread holds.

      I am trying to build a reproducer, but so far I failed because I could not get the thread that registers the Event to throw a ClassNotFoundException. I think for the other thread is is irrelevant what exactly it does, as long as it is throwing Exceptions. My attempt so far:

      class TestJFRDeadLock {

          @Name("Test")
          @Label("Test")
          public static class TestEvent extends Event {
              static {
                  System.out.println("static block");
              }
              private static final EventType TYPE = EventType.getEventType(TestEvent.class);
          }
          public static void main(String[] args) {
              new Thread(() -> {
                  while (true) {
                      try {
                          throw new Exception("test");
                      } catch (Exception e) {
                          System.out.println("caught exception");
                      }
                  }
              }).start();
              System.out.println(new TestEvent().getClass().getName());
              System.exit(0);
          }
      }

      Run with: java -XX:StartFlightRecording:exceptions=all,filename=test.jfr TestJFRDeadLock

      This bug happened with JDK17. I believe it is likely fixed (unintentionally) by https://bugs.openjdk.org/browse/JDK-8319374 because it removed much of the locking that is involved in this deadlock.

       Those are the stacktraces of the relevant threads:

      "dd-task-scheduler" #11 daemon prio=5 os_prio=0 cpu=423.02ms elapsed=76350.72s tid=0x00007ec274be3340 nid=0x103 in Object.wait() [0x00007ec2353fc000]
         java.lang.Thread.State: RUNNABLE
      at jdk.jfr.internal.instrument.ThrowableTracer.traceThrowable(jdk.jfr@17.0.16/ThrowableTracer.java:55)
      - waiting on the Class initialization monitor for jdk.jfr.events.Handlers
      at java.lang.Throwable.<init>(java.base@17.0.16/Throwable.java:296)
      at java.lang.Exception.<init>(java.base@17.0.16/Exception.java:85)
      at java.lang.ReflectiveOperationException.<init>(java.base@17.0.16/ReflectiveOperationException.java:76)
      at java.lang.ClassNotFoundException.<init>(java.base@17.0.16/ClassNotFoundException.java:71)
      at java.lang.Class.forName0(java.base@17.0.16/Native Method)
      at java.lang.Class.forName(java.base@17.0.16/Class.java:467)
      at jdk.jfr.internal.MetadataRepository.makeHandler(jdk.jfr@17.0.16/MetadataRepository.java:187)
      at jdk.jfr.internal.MetadataRepository.register(jdk.jfr@17.0.16/MetadataRepository.java:142)
      - locked <0x0000000403792e18> (a jdk.jfr.internal.MetadataRepository)
      at jdk.jfr.internal.MetadataRepository.register(jdk.jfr@17.0.16/MetadataRepository.java:130)
      - locked <0x0000000403792e18> (a jdk.jfr.internal.MetadataRepository)
      at jdk.jfr.FlightRecorder.register(jdk.jfr@17.0.16/FlightRecorder.java:136)
      at com.datadog.profiling.controller.openjdk.events.SmapEntryEvent.<clinit>(SmapEntryEvent.java)
      at jdk.jfr.internal.JVM.getHandler(jdk.jfr@17.0.16/Native Method)
      at jdk.jfr.internal.Utils.getHandler(jdk.jfr@17.0.16/Utils.java:451)
      - locked <0x0000000403739760> (a java.lang.Class for jdk.jfr.internal.Utils)
      at jdk.jfr.internal.MetadataRepository.getHandler(jdk.jfr@17.0.16/MetadataRepository.java:174)
      at jdk.jfr.internal.MetadataRepository.getEventType(jdk.jfr@17.0.16/MetadataRepository.java:114)
      - locked <0x0000000403792e18> (a jdk.jfr.internal.MetadataRepositorywai)
      at jdk.jfr.EventType.getEventType(jdk.jfr@17.0.16/EventType.java:204)
      at com.datadog.profiling.controller.openjdk.events.SmapEntryFactory.<clinit>(SmapEntryFactory.java:31)
      at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.16/Native Method)
      at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.16/NativeMethodAccessorImpl.java:77)
      at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.16/DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(java.base@17.0.16/Method.java:569)
      at datadog.trace.bootstrap.Agent.registerSmapEntryEvent(Agent.java:893)
      - locked <0x0000000402617e58> (a java.lang.Class for datadog.trace.bootstrap.Agent)
      at datadog.trace.bootstrap.Agent.startJmx(Agent.java:843)
      - locked <0x0000000402617e58> (a java.lang.Class for datadog.trace.bootstrap.Agent)
      at datadog.trace.bootstrap.Agent.access$1700(Agent.java:82)
      at datadog.trace.bootstrap.Agent$JmxStartTask.run(Agent.java:819)
      at datadog.trace.util.AgentTaskScheduler$RunnableTask.run(AgentTaskScheduler.java:59)
      at datadog.trace.util.AgentTaskScheduler$RunnableTask.run(AgentTaskScheduler.java:54)
      at datadog.trace.util.AgentTaskScheduler$PeriodicTask.run(AgentTaskScheduler.java:329)
      at datadog.trace.util.AgentTaskScheduler$Worker.run(AgentTaskScheduler.java:284)
      at java.lang.Thread.run(java.base@17.0.16/Thread.java:840)


      "dd-profiler-recording-scheduler" #20 daemon prio=5 os_prio=0 cpu=1370.37ms elapsed=76349.15s tid=0x00007ec2750f3b50 nid=0x106 waiting for monitor entry [0x00007ec1d9ffb000]
         java.lang.Thread.State: BLOCKED (on object monitor)
      at jdk.jfr.internal.Utils.getHandler(jdk.jfr@17.0.16/Utils.java:450)
      - waiting to lock <0x0000000403739760> (a java.lang.Class for jdk.jfr.internal.Utils)
      at jdk.jfr.events.Handlers.<clinit>(jdk.jfr@17.0.16/Handlers.java:31)
      at jdk.jfr.internal.instrument.ThrowableTracer.traceThrowable(jdk.jfr@17.0.16/ThrowableTracer.java:55)
      at java.lang.Throwable.<init>(java.base@17.0.16/Throwable.java:273)
      at java.lang.Error.<init>(java.base@17.0.16/Error.java:71)
      at java.lang.LinkageError.<init>(java.base@17.0.16/LinkageError.java:55)
      at java.lang.IncompatibleClassChangeError.<init>(java.base@17.0.16/IncompatibleClassChangeError.java:54)
      at java.lang.NoSuchMethodError.<init>(java.base@17.0.16/NoSuchMethodError.java:57)
      at java.lang.invoke.MethodHandleNatives.resolve(java.base@17.0.16/Native Method)
      at java.lang.invoke.MemberName$Factory.resolve(java.base@17.0.16/MemberName.java:1085)
      at java.lang.invoke.MemberName$Factory.resolveOrNull(java.base@17.0.16/MemberName.java:1129)
      at java.lang.invoke.InvokerBytecodeGenerator.resolveFrom(java.base@17.0.16/InvokerBytecodeGenerator.java:687)
      at java.lang.invoke.InvokerBytecodeGenerator.lookupPregenerated(java.base@17.0.16/InvokerBytecodeGenerator.java:739)
      at java.lang.invoke.InvokerBytecodeGenerator.generateCustomizedCode(java.base@17.0.16/InvokerBytecodeGenerator.java:748)
      at java.lang.invoke.LambdaForm.compileToBytecode(java.base@17.0.16/LambdaForm.java:854)
      at java.lang.invoke.DirectMethodHandle.makePreparedLambdaForm(java.base@17.0.16/DirectMethodHandle.java:305)
      at java.lang.invoke.DirectMethodHandle.preparedLambdaForm(java.base@17.0.16/DirectMethodHandle.java:233)
      at java.lang.invoke.DirectMethodHandle.preparedLambdaForm(java.base@17.0.16/DirectMethodHandle.java:218)
      at java.lang.invoke.DirectMethodHandle.preparedLambdaForm(java.base@17.0.16/DirectMethodHandle.java:227)
      at java.lang.invoke.DirectMethodHandle.make(java.base@17.0.16/DirectMethodHandle.java:108)
      at java.lang.invoke.MethodHandles$Lookup.getDirectMethodCommon(java.base@17.0.16/MethodHandles.java:4004)
      at java.lang.invoke.MethodHandles$Lookup.getDirectMethodNoSecurityManager(java.base@17.0.16/MethodHandles.java:3960)
      at java.lang.invoke.MethodHandles$Lookup.getDirectMethodForConstant(java.base@17.0.16/MethodHandles.java:4204)
      at java.lang.invoke.MethodHandles$Lookup.linkMethodHandleConstant(java.base@17.0.16/MethodHandles.java:4152)
      at java.lang.invoke.MethodHandleNatives.linkMethodHandleConstant(java.base@17.0.16/MethodHandleNatives.java:615)
      at jdk.jfr.internal.SecuritySupport.setDaemonThread(jdk.jfr@17.0.16/SecuritySupport.java:483)
      at jdk.jfr.internal.PlatformRecorder.startDiskMonitor(jdk.jfr@17.0.16/PlatformRecorder.java:450)
      at jdk.jfr.internal.PlatformRecorder.<init>(jdk.jfr@17.0.16/PlatformRecorder.java:89)
      at jdk.jfr.FlightRecorder.getFlightRecorder(jdk.jfr@17.0.16/FlightRecorder.java:179)
      - locked <0x0000000408003a38> (a java.lang.Class for jdk.jfr.internal.PlatformRecorder)
      at jdk.jfr.Recording.<init>(jdk.jfr@17.0.16/Recording.java:114)
      at jdk.jfr.Recording.<init>(jdk.jfr@17.0.16/Recording.java:138)
      at com.datadog.profiling.controller.openjdk.OpenJdkOngoingRecording.<init>(OpenJdkOngoingRecording.java:52)
      at com.datadog.profiling.controller.openjdk.OpenJdkController.createRecording(OpenJdkController.java:312)
      at com.datadog.profiling.controller.openjdk.OpenJdkController.createRecording(OpenJdkController.java:59)
      at com.datadog.profiling.controller.ProfilingSystem.startProfilingRecording(ProfilingSystem.java:158)
      at com.datadog.profiling.controller.ProfilingSystem$$Lambda$513/0x00007ec1e0336ad0.run(Unknown Source)
      at datadog.trace.util.AgentTaskScheduler$PeriodicTask.run(AgentTaskScheduler.java:329)
      at datadog.trace.util.AgentTaskScheduler$Worker.run(AgentTaskScheduler.java:284)
      at java.lang.Thread.run(java.base@17.0.16/Thread.java:840)

            Assignee:
            Roman Kennke
            Reporter:
            Roman Kennke
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: