Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-7072527

CMS: JMM GC counters overcount in some cases

XMLWordPrintable

    • gc
    • b02
    • generic
    • generic
    • Verified

        From Krystal Mok (###@###.###) on the hotspot-gc-dev@o.j.n list:-

        Hi all,

        I've been looking at a strange inconsistency of full GC count recorded by jvmstat and JMM counters. I'd like to know which ones of the following behaviors are by design, which ones are bugs, and which ones are just my misunderstanding. I apologize for making a short story long...

        =====================================================

        The counters involved:

        * A jvmstat counter named "sun.gc.collector.1.invocations" keeps track of the number of pauses occured as a result of a major collection. It is used by utilities such as jstat as the source of "FGC" (full collection count), and the old gen collection count in Visual GC. It's updated by an TraceCollectorStats object.
        * A JMM counter, GCMemoryManager::_num_collections, keeps track of the number of collections that have ended. This counter is used as HotSpot's implementation of the JMX GarbageCollectorMXBean.getCollectionCount(). It's updated by either a TraceMemoryManagerStats object or a TraceCMSMemoryManagerStats object.

        To show the situation, I've made a screenshot of a VisualVM and a JConsole running side by side, both are monitoring the VisualVM's GC stats:
        http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png
        (I'll upload the screenshot to somewhere else if anybody can't see it)
        The VisualVM instance is running on JDK6u26, with ParNew+CMS.
        In the screenshot, Visual GC reports that the old gen collection count is 20, while JConsole reports 10.

        I see that there was this bug:
        6580448: CMS: Full GC collection count mismatch between GarbageCollectorMXBean and jvmstat (VisualGC)
        I don't think the current implementation has a bug in the sense that the two counters don't report the same number.

        This behavior seems reasonable, but the naming of the value in these tools are confusing: both tools say "collections", but apparently the number in Visual GC means "number of pauses" where as the number in JConsole means "number of collection cycles". It'd be great if the difference could be documented somewhere, if that's the intended behavior.

        And then the buggy behavior. Code demo posted on gist: https://gist.github.com/1106263
        Starting from JDK6u23, when using CMS without ExplicitGCInvokesConcurrent, System.gc() (or Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would make the JMM GC counter increment by 2 per invocation, while the jvmstat counter is only incremented by 1. I believe the latter is correct and the former needs some fixing.

        =====================================================

        My understanding of the behavior shown above:

        1. The concurrent GC part:

        There are 2 pauses in a CMS concurrent GC cycle, one in the initial mark phase, and one in the final remark phase.
        To trigger a concurrent GC cycle, the CMS thread wakes up periodically to see if it shouldConcurrentCollect(), and trigger a cycle when the predicate returned true, or goes back to sleep if the predicate returned false. The whole concurrent GC cycle doesn't go through GenCollectedHeap::do_collection().

        The jvmstat counter for old gen pauses is updated in CMSCollector::do_CMS_operation(CMS_op_type op), which exactly covers both pause phases.

        The JMM counter, however, is updated in the concurrent sweep phase, CMSCollector::sweep(bool asynch), if there was no concurrent mode failure; or it is updated in CMSCollector::do_compaction_work(bool clear_all_soft_refs) in case of a bailout due to concurrent mode failure (advertised as so in the code comments). So that's an increment by 1 per concurrent GC cycle, which does reflect the "number of collection cycles", fair enough.

        So far so good.

        2. The System.gc() part:

        Without ExplicitGCInvokesConcurrent set, System.gc() does a stop-the-world full GC, which consists of only one pause, so "number of pauses" would equal "number of collections" in this case.
        It will go into GenCollectedHeap::do_collection(); both the jvmstat and the JMM GC counter gets incremented by 1 here,

        TraceCollectorStats tcs(_gens[i]->counters());
        TraceMemoryManagerStats tmms(_gens[i]->kind());

        But, drilling down into:
        _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);

        That'll eventually go into:
        CMSCollector::acquire_control_and_collect(bool full, bool clear_all_soft_refs)

        System.gc() is user requested so that'll go further into mark-sweep-compact:
        CMSCollector::do_compaction_work(bool clear_all_soft_refs)
        And here, it increments the JMM GC counter again (remember it was in the concurrent GC path too, to handle bailouts), even though this is still in the same collection. This leads to the "buggy behavior" mentioned earlier.

        The JMM GC counter wasn't added to CMS until this fix got in:
        6581734: CMS Old Gen's collection usage is zero after GC which is incorrect

        The code added to CMSCollector::do_compaction_work() works fine in the concurrent GC path, but interacts badly with the original logic in GenCollectedHeap::do_collection().

        =====================================================

        I thought all concurrent mode failures/interrupts come from GenCollectedHeap::do_collection(). If that's the case, then it seems unnecessary to update the JMM GC counter in CMSCollector::do_compaction_work(), simply removing it should fix the problem.

        With that, I'd purpose the following (XS) change: (diff against HS20)

        diff -r f0f676c5a2c6 src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
        --- a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp Tue Mar 15 19:30:16 2011 -0700
        +++ b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp Thu Jul 28 00:02:41 2011 +0800
        @@ -2022,9 +2022,6 @@
                                                     _intra_sweep_estimate.padded_average());
           }

        - {
        - TraceCMSMemoryManagerStats();
        - }
           GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
             ref_processor(), clear_all_soft_refs);
           #ifdef ASSERT

        The same goes for the changes in:
        7036199: Adding a notification to the implementation of GarbageCollectorMXBeans

        =====================================================

        P.S. Is there an "official" name for the counters that I referred to as "jvmstat counters" above? Is it just jvmstat, or PerfData or HSPERFDATA?

              kevinw Kevin Walls
              ysr Y. Ramakrishna
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: