A DESCRIPTION OF THE PROBLEM :
We use Netty, which has JFR events for e.g. buffer management events.
With recording turned on, we occasionally experience a deadlock. One thread is loading a class, holding the classloader lock, then calling into JFR to eagerly create instrumentation, taking the metadata lock. The other thread is registering an event, holding the metadata lock, and then loading a class of one of its event field types, taking the classloader lock. Both threads now hang forever due to classic lock order reversal.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
The deadlock is difficult to reproduce, since it depends on class loading order
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The event should register, and the class should load
ACTUAL -
Found one Java-level deadlock:
=============================
"JFR Periodic Tasks":
waiting to lock monitor 0x000059d8ccc72b70 (object 0x00007ad6fffe4680, a jdk.jfr.internal.MetadataRepository),
which is held by "lettuce-nioEventLoop-4-5"
"lettuce-nioEventLoop-4-5":
waiting to lock monitor 0x000059d8c7416010 (object 0x00007adcd63f2038, a java.lang.Object),
which is held by "lettuce-nioEventLoop-4-8"
"lettuce-nioEventLoop-4-8":
waiting to lock monitor 0x000059d8ccc72b70 (object 0x00007ad6fffe4680, a jdk.jfr.internal.MetadataRepository),
which is held by "lettuce-nioEventLoop-4-5"
Java stack information for the threads listed above:
===================================================
"JFR Periodic Tasks":
at jdk.jfr.internal.MetadataRepository.flush(jdk.jfr@24.0.2/MetadataRepository.java:379)
- waiting to lock <0x00007ad6fffe4680> (a jdk.jfr.internal.MetadataRepository)
at jdk.jfr.internal.PlatformRecorder.flush(jdk.jfr@24.0.2/PlatformRecorder.java:684)
- locked <0x00007ad5bc0041f8> (a jdk.jfr.internal.PlatformRecorder)
at jdk.jfr.internal.periodic.FlushTask.execute(jdk.jfr@24.0.2/FlushTask.java:50)
at jdk.jfr.internal.periodic.PeriodicTask.run(jdk.jfr@24.0.2/PeriodicTask.java:128)
at jdk.jfr.internal.periodic.PeriodicEvents.doPeriodic(jdk.jfr@24.0.2/PeriodicEvents.java:175)
at jdk.jfr.internal.PlatformRecorder.periodicTask(jdk.jfr@24.0.2/PlatformRecorder.java:518)
at jdk.jfr.internal.PlatformRecorder.lambda$startDiskMonitor$0(jdk.jfr@24.0.2/PlatformRecorder.java:444)
at jdk.jfr.internal.PlatformRecorder$$Lambda/0x00000000b91750a8.run(jdk.jfr@24.0.2/Unknown Source)
at java.lang.Thread.runWith(java.base@24.0.2/Thread.java:1460)
at java.lang.Thread.run(java.base@24.0.2/Thread.java:1447)
"lettuce-nioEventLoop-4-5":
at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(java.base@24.0.2/BuiltinClassLoader.java:590)
- waiting to lock <0x00007adcd63f2038> (a java.lang.Object)
at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@24.0.2/BuiltinClassLoader.java:578)
at java.lang.ClassLoader.loadClass(java.base@24.0.2/ClassLoader.java:490)
at java.lang.Class.getDeclaredFields0(java.base@24.0.2/Native Method)
at java.lang.Class.privateGetDeclaredFields(java.base@24.0.2/Class.java:2930)
at java.lang.Class.getDeclaredFields(java.base@24.0.2/Class.java:2249)
at jdk.jfr.internal.util.Utils.getVisibleEventFields(jdk.jfr@24.0.2/Utils.java:214)
at jdk.jfr.internal.TypeLibrary.addUserFields(jdk.jfr@24.0.2/TypeLibrary.java:333)
at jdk.jfr.internal.TypeLibrary.createType(jdk.jfr@24.0.2/TypeLibrary.java:283)
- locked <0x00007ad6fffd3588> (a java.lang.Class for jdk.jfr.internal.TypeLibrary)
at jdk.jfr.internal.MetadataRepository.makeConfiguration(jdk.jfr@24.0.2/MetadataRepository.java:201)
at jdk.jfr.internal.MetadataRepository.register(jdk.jfr@24.0.2/MetadataRepository.java:150)
- locked <0x00007ad6fffe4680> (a jdk.jfr.internal.MetadataRepository)
at jdk.jfr.internal.MetadataRepository.register(jdk.jfr@24.0.2/MetadataRepository.java:134)
- locked <0x00007ad6fffe4680> (a jdk.jfr.internal.MetadataRepository)
at jdk.jfr.FlightRecorder.register(jdk.jfr@24.0.2/FlightRecorder.java:127)
at io.netty.buffer.ReturnChunkEvent.<clinit>(ReturnChunkEvent.java)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.onReturn(AdaptivePoolingAllocator.java:1255)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.deallocate(AdaptivePoolingAllocator.java:1249)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.release(AdaptivePoolingAllocator.java:1191)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.releaseSegment(AdaptivePoolingAllocator.java:1217)
at io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf.deallocate(AdaptivePoolingAllocator.java:1888)
at io.netty.buffer.AbstractReferenceCountedByteBuf.handleRelease(AbstractReferenceCountedByteBuf.java:111)
at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101)
at io.lettuce.core.protocol.CommandHandler.channelUnregistered(CommandHandler.java:248)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:190)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelUnregistered(DefaultChannelPipeline.java:1407)
at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:780)
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:692)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:148)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:141)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:507)
at io.netty.channel.SingleThreadIoEventLoop.run(SingleThreadIoEventLoop.java:183)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:1073)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.runWith(java.base@24.0.2/Thread.java:1460)
at java.lang.Thread.run(java.base@24.0.2/Thread.java:1447)
"lettuce-nioEventLoop-4-8":
at jdk.jfr.internal.MetadataRepository.isEnabled(jdk.jfr@24.0.2/MetadataRepository.java:304)
- waiting to lock <0x00007ad6fffe4680> (a jdk.jfr.internal.MetadataRepository)
at jdk.jfr.internal.JVMUpcalls.bytesForEagerInstrumentation(jdk.jfr@24.0.2/JVMUpcalls.java:122)
at java.lang.ClassLoader.defineClass1(java.base@24.0.2/Native Method)
at java.lang.ClassLoader.defineClass(java.base@24.0.2/ClassLoader.java:962)
at java.security.SecureClassLoader.defineClass(java.base@24.0.2/SecureClassLoader.java:132)
at jdk.internal.loader.BuiltinClassLoader.defineClass(java.base@24.0.2/BuiltinClassLoader.java:776)
at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(java.base@24.0.2/BuiltinClassLoader.java:691)
at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(java.base@24.0.2/BuiltinClassLoader.java:620)
- locked <0x00007adcd63f2038> (a java.lang.Object)
at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@24.0.2/BuiltinClassLoader.java:578)
at java.lang.ClassLoader.loadClass(java.base@24.0.2/ClassLoader.java:490)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.onRelease(AdaptivePoolingAllocator.java:1266)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.deallocate(AdaptivePoolingAllocator.java:1227)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.release(AdaptivePoolingAllocator.java:1191)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.releaseFromMagazine(AdaptivePoolingAllocator.java:1210)
at io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk.releaseFromMagazine(AdaptivePoolingAllocator.java:1384)
at io.netty.buffer.AdaptivePoolingAllocator$Magazine.free(AdaptivePoolingAllocator.java:1046)
at io.netty.buffer.AdaptivePoolingAllocator$MagazineGroup.free(AdaptivePoolingAllocator.java:472)
at io.netty.buffer.AdaptivePoolingAllocator$MagazineGroup.access$200(AdaptivePoolingAllocator.java:352)
at io.netty.buffer.AdaptivePoolingAllocator$1.onRemoval(AdaptivePoolingAllocator.java:199)
at io.netty.buffer.AdaptivePoolingAllocator$1.onRemoval(AdaptivePoolingAllocator.java:186)
at io.netty.util.concurrent.FastThreadLocal.removeAndGet(FastThreadLocal.java:286)
at io.netty.util.concurrent.FastThreadLocal.remove(FastThreadLocal.java:268)
at io.netty.util.concurrent.FastThreadLocal.removeAll(FastThreadLocal.java:68)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:1153)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.runWith(java.base@24.0.2/Thread.java:1460)
at java.lang.Thread.run(java.base@24.0.2/Thread.java:1447)
Found 1 deadlock.
We use Netty, which has JFR events for e.g. buffer management events.
With recording turned on, we occasionally experience a deadlock. One thread is loading a class, holding the classloader lock, then calling into JFR to eagerly create instrumentation, taking the metadata lock. The other thread is registering an event, holding the metadata lock, and then loading a class of one of its event field types, taking the classloader lock. Both threads now hang forever due to classic lock order reversal.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
The deadlock is difficult to reproduce, since it depends on class loading order
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The event should register, and the class should load
ACTUAL -
Found one Java-level deadlock:
=============================
"JFR Periodic Tasks":
waiting to lock monitor 0x000059d8ccc72b70 (object 0x00007ad6fffe4680, a jdk.jfr.internal.MetadataRepository),
which is held by "lettuce-nioEventLoop-4-5"
"lettuce-nioEventLoop-4-5":
waiting to lock monitor 0x000059d8c7416010 (object 0x00007adcd63f2038, a java.lang.Object),
which is held by "lettuce-nioEventLoop-4-8"
"lettuce-nioEventLoop-4-8":
waiting to lock monitor 0x000059d8ccc72b70 (object 0x00007ad6fffe4680, a jdk.jfr.internal.MetadataRepository),
which is held by "lettuce-nioEventLoop-4-5"
Java stack information for the threads listed above:
===================================================
"JFR Periodic Tasks":
at jdk.jfr.internal.MetadataRepository.flush(jdk.jfr@24.0.2/MetadataRepository.java:379)
- waiting to lock <0x00007ad6fffe4680> (a jdk.jfr.internal.MetadataRepository)
at jdk.jfr.internal.PlatformRecorder.flush(jdk.jfr@24.0.2/PlatformRecorder.java:684)
- locked <0x00007ad5bc0041f8> (a jdk.jfr.internal.PlatformRecorder)
at jdk.jfr.internal.periodic.FlushTask.execute(jdk.jfr@24.0.2/FlushTask.java:50)
at jdk.jfr.internal.periodic.PeriodicTask.run(jdk.jfr@24.0.2/PeriodicTask.java:128)
at jdk.jfr.internal.periodic.PeriodicEvents.doPeriodic(jdk.jfr@24.0.2/PeriodicEvents.java:175)
at jdk.jfr.internal.PlatformRecorder.periodicTask(jdk.jfr@24.0.2/PlatformRecorder.java:518)
at jdk.jfr.internal.PlatformRecorder.lambda$startDiskMonitor$0(jdk.jfr@24.0.2/PlatformRecorder.java:444)
at jdk.jfr.internal.PlatformRecorder$$Lambda/0x00000000b91750a8.run(jdk.jfr@24.0.2/Unknown Source)
at java.lang.Thread.runWith(java.base@24.0.2/Thread.java:1460)
at java.lang.Thread.run(java.base@24.0.2/Thread.java:1447)
"lettuce-nioEventLoop-4-5":
at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(java.base@24.0.2/BuiltinClassLoader.java:590)
- waiting to lock <0x00007adcd63f2038> (a java.lang.Object)
at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@24.0.2/BuiltinClassLoader.java:578)
at java.lang.ClassLoader.loadClass(java.base@24.0.2/ClassLoader.java:490)
at java.lang.Class.getDeclaredFields0(java.base@24.0.2/Native Method)
at java.lang.Class.privateGetDeclaredFields(java.base@24.0.2/Class.java:2930)
at java.lang.Class.getDeclaredFields(java.base@24.0.2/Class.java:2249)
at jdk.jfr.internal.util.Utils.getVisibleEventFields(jdk.jfr@24.0.2/Utils.java:214)
at jdk.jfr.internal.TypeLibrary.addUserFields(jdk.jfr@24.0.2/TypeLibrary.java:333)
at jdk.jfr.internal.TypeLibrary.createType(jdk.jfr@24.0.2/TypeLibrary.java:283)
- locked <0x00007ad6fffd3588> (a java.lang.Class for jdk.jfr.internal.TypeLibrary)
at jdk.jfr.internal.MetadataRepository.makeConfiguration(jdk.jfr@24.0.2/MetadataRepository.java:201)
at jdk.jfr.internal.MetadataRepository.register(jdk.jfr@24.0.2/MetadataRepository.java:150)
- locked <0x00007ad6fffe4680> (a jdk.jfr.internal.MetadataRepository)
at jdk.jfr.internal.MetadataRepository.register(jdk.jfr@24.0.2/MetadataRepository.java:134)
- locked <0x00007ad6fffe4680> (a jdk.jfr.internal.MetadataRepository)
at jdk.jfr.FlightRecorder.register(jdk.jfr@24.0.2/FlightRecorder.java:127)
at io.netty.buffer.ReturnChunkEvent.<clinit>(ReturnChunkEvent.java)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.onReturn(AdaptivePoolingAllocator.java:1255)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.deallocate(AdaptivePoolingAllocator.java:1249)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.release(AdaptivePoolingAllocator.java:1191)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.releaseSegment(AdaptivePoolingAllocator.java:1217)
at io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf.deallocate(AdaptivePoolingAllocator.java:1888)
at io.netty.buffer.AbstractReferenceCountedByteBuf.handleRelease(AbstractReferenceCountedByteBuf.java:111)
at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101)
at io.lettuce.core.protocol.CommandHandler.channelUnregistered(CommandHandler.java:248)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:190)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelUnregistered(DefaultChannelPipeline.java:1407)
at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:780)
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:692)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:148)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:141)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:507)
at io.netty.channel.SingleThreadIoEventLoop.run(SingleThreadIoEventLoop.java:183)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:1073)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.runWith(java.base@24.0.2/Thread.java:1460)
at java.lang.Thread.run(java.base@24.0.2/Thread.java:1447)
"lettuce-nioEventLoop-4-8":
at jdk.jfr.internal.MetadataRepository.isEnabled(jdk.jfr@24.0.2/MetadataRepository.java:304)
- waiting to lock <0x00007ad6fffe4680> (a jdk.jfr.internal.MetadataRepository)
at jdk.jfr.internal.JVMUpcalls.bytesForEagerInstrumentation(jdk.jfr@24.0.2/JVMUpcalls.java:122)
at java.lang.ClassLoader.defineClass1(java.base@24.0.2/Native Method)
at java.lang.ClassLoader.defineClass(java.base@24.0.2/ClassLoader.java:962)
at java.security.SecureClassLoader.defineClass(java.base@24.0.2/SecureClassLoader.java:132)
at jdk.internal.loader.BuiltinClassLoader.defineClass(java.base@24.0.2/BuiltinClassLoader.java:776)
at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(java.base@24.0.2/BuiltinClassLoader.java:691)
at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(java.base@24.0.2/BuiltinClassLoader.java:620)
- locked <0x00007adcd63f2038> (a java.lang.Object)
at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@24.0.2/BuiltinClassLoader.java:578)
at java.lang.ClassLoader.loadClass(java.base@24.0.2/ClassLoader.java:490)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.onRelease(AdaptivePoolingAllocator.java:1266)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.deallocate(AdaptivePoolingAllocator.java:1227)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.release(AdaptivePoolingAllocator.java:1191)
at io.netty.buffer.AdaptivePoolingAllocator$Chunk.releaseFromMagazine(AdaptivePoolingAllocator.java:1210)
at io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk.releaseFromMagazine(AdaptivePoolingAllocator.java:1384)
at io.netty.buffer.AdaptivePoolingAllocator$Magazine.free(AdaptivePoolingAllocator.java:1046)
at io.netty.buffer.AdaptivePoolingAllocator$MagazineGroup.free(AdaptivePoolingAllocator.java:472)
at io.netty.buffer.AdaptivePoolingAllocator$MagazineGroup.access$200(AdaptivePoolingAllocator.java:352)
at io.netty.buffer.AdaptivePoolingAllocator$1.onRemoval(AdaptivePoolingAllocator.java:199)
at io.netty.buffer.AdaptivePoolingAllocator$1.onRemoval(AdaptivePoolingAllocator.java:186)
at io.netty.util.concurrent.FastThreadLocal.removeAndGet(FastThreadLocal.java:286)
at io.netty.util.concurrent.FastThreadLocal.remove(FastThreadLocal.java:268)
at io.netty.util.concurrent.FastThreadLocal.removeAll(FastThreadLocal.java:68)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:1153)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.runWith(java.base@24.0.2/Thread.java:1460)
at java.lang.Thread.run(java.base@24.0.2/Thread.java:1447)
Found 1 deadlock.
- links to
-
Commit(master) openjdk/jdk/5fccabff
-
Review(master) openjdk/jdk/27592