Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6433335

ParNewGC times spiking, eventually taking up 20 out of every 30 seconds

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P3 P3
    • hs10
    • 1.4.2, 5.0u8
    • hotspot
    • gc
    • 1.4.2
    • b03
    • x86, sparc
    • solaris_9, windows_2003

        Customer is using an ER release of 5.0U6
        The ER is 1.5.0_06-erdist-2006-02-01. The bug addressed was 6367204

        The hardware is:
               Dell 4xDual core Xeon
               32GB RAM
               10x RAID 10 HDD
           The OS is Server 2003.
           The java version and configuration:
               JRE v.1.5.0_06-erdist-20060201
               28GB heap
               GC options: -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
        -XX:ParallelGCThreads=7 -XX:NewSize=128M -XX:MaxNewSize=128M
        -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing
        -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
        -XX:CMSMarkStackSize=8M -XX:CMSMarkStackSizeMax=32M -XX:+UseLargePages
        -XX:+DisableExplicitGC

           The application is a custom distributed database server based on
        TCP/IP and Sleepycat DBJE

           The symptoms:
               After running smoothly for ~1-4 days straight with
        constant but light load, the ParNew GC's jump from ~150 ms every 30
        seconds to 5-20 seconds out of every 30 seconds. The start of the
        degenerate ParNew GCs seem to mostly (but not always) coincide with the
        start of a new CMS mark phase. The general pattern is to spend 20-90%
        of the time in young GC, which eventually quiesces down to acceptable
        levels after ~4 hours of GC pain (frequently to re-start after the next
        CMS sweep).
               The load was constant and unvaried from our side, so we
        don't see any application-level cause for the degenerate GC performance.


        They ran a test with 5.0u8 and the problem seemed to be pushed out.
        The time to failure went to 48 hours for the initial 5 second spike and
        another day or so to hit the ~20 second spikes.

        They were running with large pages so they ran a test without it and with their ER 5.0u6 and the problem seemed to have gone away, but returned many days later.
        Turning off large pages seem to have also extended the running but eventually
        they still see the problem

              ysr Y. Ramakrishna
              msusko Mark Susko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: