-
Bug
-
Resolution: Fixed
-
P2
-
21, 22, 23, 24, 25
-
b09
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8350271 | 24.0.2 | Axel Boldt-Christmas | P2 | Resolved | Fixed | b01 |
JDK-8350574 | 24.0.1 | Nibedita Jena | P2 | Closed | Fixed | b08 |
We have received a report that there's a crash in clean_unloading_dependents when running with Generational ZGC.
V [libjvm.so+0x6fa690] DependencyContext::clean_unloading_dependents()+0xa0
V [libjvm.so+0xcc61f5] MethodHandles::clean_dependency_context(oopDesc*)+0x35
V [libjvm.so+0xcef4d2] nmethod::unlink() [clone .part.0]+0x162
V [libjvm.so+0x1102c76] ZNMethodUnlinkClosure::do_nmethod(nmethod*)+0x156
V [libjvm.so+0x110443a] ZNMethodTableIteration::nmethods_do(NMethodClosure*)+0x6a
V [libjvm.so+0x1073210] WorkerThread::run()+0x90
V [libjvm.so+0xfb3cb7] Thread::call_run()+0xb7
V [libjvm.so+0xd339ba] thread_native_entry(Thread*)+0xda
We have figured out that the problem happens because of how we walk a chain of dead objects to get to the list of "unloading dependents" aka dependencies aka vmdependencies.
Some background to the involved data structures:
First we have the CallSite java object, which gets installed into the oop section of the nmethods. One CallSite can be installed into many nmethods. For each CallSite we maintain a list of the nmethods that the CallSite has been installed into. This list is called the dependencies or vmdependencies.
The GC needs to clean the dependencies list if nmethods are unloaded. Otherwise the list would contain stale entries. So, during class unloading the GC peeks through the CallSite to get to the dependencies. This happens just after the liveness analysis has been done. At this point the CallSite may or may not have been found to be live, but in any case the memory will still be available (not overwritten), so the GC reaches into the potentially dead CallSite to fetch the dependencies list. So, far this conceptually works with all GCs.
However, the JDK code uses a Cleaner to trigger the deletion of the dependencies list if the CallSite dies. The Cleaner mechanism requires another Java object to run the cleanup action on-behalf of the dying CallSite. This object is called the CallSiteContext object. For this reason, it is the CallSiteContext object that holds the dependencies list.
So, we now have:
CallSite (Java object)
-> CallSiteContext (Java object)
--> vmdependencies (native object)
ZGC manages to look through a dying CallSite during the unloading phase after an old generation collection. Note that registered CallSites can only die in the old generation because we keep them alive through the nmethods during young collections. But, if the CallSiteContext end up in the young generation, the link from CallSite to CallSiteContext can go stale and when the old generation tries to load the CallSiteContext to perform the unloading of the dependencies the GC will/can crash. We think that that's what happened in the reported crash.
The fix on the drawing table is to get rid of the Cleaner system given that the GCs already perform the appropriate cleaning. This would allow us to remove the CallSiteContext object and then we would never risk having to visit a young gen object during class unloading.
In a longer perspective we might also want to remove the visiting of dead CallSite objects, but I think we should leave that for a future cleanup.
V [libjvm.so+0x6fa690] DependencyContext::clean_unloading_dependents()+0xa0
V [libjvm.so+0xcc61f5] MethodHandles::clean_dependency_context(oopDesc*)+0x35
V [libjvm.so+0xcef4d2] nmethod::unlink() [clone .part.0]+0x162
V [libjvm.so+0x1102c76] ZNMethodUnlinkClosure::do_nmethod(nmethod*)+0x156
V [libjvm.so+0x110443a] ZNMethodTableIteration::nmethods_do(NMethodClosure*)+0x6a
V [libjvm.so+0x1073210] WorkerThread::run()+0x90
V [libjvm.so+0xfb3cb7] Thread::call_run()+0xb7
V [libjvm.so+0xd339ba] thread_native_entry(Thread*)+0xda
We have figured out that the problem happens because of how we walk a chain of dead objects to get to the list of "unloading dependents" aka dependencies aka vmdependencies.
Some background to the involved data structures:
First we have the CallSite java object, which gets installed into the oop section of the nmethods. One CallSite can be installed into many nmethods. For each CallSite we maintain a list of the nmethods that the CallSite has been installed into. This list is called the dependencies or vmdependencies.
The GC needs to clean the dependencies list if nmethods are unloaded. Otherwise the list would contain stale entries. So, during class unloading the GC peeks through the CallSite to get to the dependencies. This happens just after the liveness analysis has been done. At this point the CallSite may or may not have been found to be live, but in any case the memory will still be available (not overwritten), so the GC reaches into the potentially dead CallSite to fetch the dependencies list. So, far this conceptually works with all GCs.
However, the JDK code uses a Cleaner to trigger the deletion of the dependencies list if the CallSite dies. The Cleaner mechanism requires another Java object to run the cleanup action on-behalf of the dying CallSite. This object is called the CallSiteContext object. For this reason, it is the CallSiteContext object that holds the dependencies list.
So, we now have:
CallSite (Java object)
-> CallSiteContext (Java object)
--> vmdependencies (native object)
ZGC manages to look through a dying CallSite during the unloading phase after an old generation collection. Note that registered CallSites can only die in the old generation because we keep them alive through the nmethods during young collections. But, if the CallSiteContext end up in the young generation, the link from CallSite to CallSiteContext can go stale and when the old generation tries to load the CallSiteContext to perform the unloading of the dependencies the GC will/can crash. We think that that's what happened in the reported crash.
The fix on the drawing table is to get rid of the Cleaner system given that the GCs already perform the appropriate cleaning. This would allow us to remove the CallSiteContext object and then we would never risk having to visit a young gen object during class unloading.
In a longer perspective we might also want to remove the visiting of dead CallSite objects, but I think we should leave that for a future cleanup.
- backported by
-
JDK-8350271 ZGC: Crash in DependencyContext::clean_unloading_dependents
-
- Resolved
-
-
JDK-8350574 ZGC: Crash in DependencyContext::clean_unloading_dependents
-
- Closed
-
- links to
-
Commit(master) openjdk/jdk24u/352a7a97
-
Commit(master) openjdk/jdk/14136f8b
-
Review(master) openjdk/jdk24u/67
-
Review(master) openjdk/jdk/23194
(1 links to)