Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8264640

CMS ParScanClosure misses a barrier



    • gc
    • b01
    • generic
    • generic



        During an investigation of a GC crash with the same stack trace as in JDK-8222798 I found there is a barrier missing in the code of ParScanClosure::do_oop_work.


        The comment states next lines are need to be ordered, but nothing prevents C compiler to reorder them. I spotted actual instances of this C code compiled by GCC 4.4.7 with the two reads reordered.

        After adding a compiler barrier between reads, the rather frequent GC crash has gone.

        Adding a root cause analysis from @JonhC

        > We were seeing this as a crash when obtaining the size of an object to be copied. The klass was observed to be transiently NULL. We found that the object, reached through another reference path, had already been copied and the from-space oop placed on the task queue for subsequent reference field scanning. The task queue, however, had overflowed and the from-space oop was placed on the shared overflow queue where objects are chained together through their klass field. If the reads are ordered as they are in the code then everything is OK as per the comment at line 105 (in ParScanClosure::do_oop_work) but we found that gcc had reordered the reads in the non-compressed oops case. So the mark word is read and the object is observed to not forwarded (yet). Then, via another reference path, the object is copied, forwarded, and placed on the overflow task queue — over writing the from-space object’s klass. Then in the original path the klass is read and observed to be NULL or the next overflow entry — leading to the crash. When the from-space oop is dequeued, its klass is restored — which is what was observed in the core file.



          Issue Links



                akozlov Anton Kozlov
                akozlov Anton Kozlov
                0 Vote for this issue
                8 Start watching this issue