- 
    Bug 
- 
    Resolution: Fixed
- 
     P2 P2
- 
    1.4.2_11
- 
        b03
- 
        sparc
- 
        solaris_8
                    Customer has a voice over IP application running 1.4.2_11 that after a period of time
(usually 14-24 hours)
They have a pre-clean that takes quite some time. We've seen from a few hundred seconds up to 6000+
GC options: -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled
-XX:MaxNewSize=128m -XX:NewSize=128m -Xms1G -Xmx1G -XX:SurvivorRatio=6
-XX:MaxTenuringThreshold=4 -XX:CMSInitiatingOccupancyFraction=60
-Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFFE
-Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFFE
-XX:+DisableExplicitGC -XX:+UseTLAB -verbose:gc
-XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xloggc:gc
an example of a 6000+ preclean:
90401.397: [GC {Heap before GC invocations=6436:
Heap
par new generation total 114688K, used 107474K [0xb5400000, 0xbd400000, 0xbd400000)
eden space 98304K, 100% used [0xb5400000, 0xbb400000, 0xbb400000)
from space 16384K, 55% used [0xbb400000, 0xbbcf4b70, 0xbc400000)
to space 16384K, 0% used [0xbc400000, 0xbc400000, 0xbd400000)
concurrent mark-sweep generation total 917504K, used 761505K [0xbd400000, 0xf5400000, 0xf5400000)
concurrent-mark-sweep perm gen total 51336K, used 30768K [0xf5400000, 0xf8622000, 0xf9400000)
90401.397: [ParNew (promotion failed): 107474K->107474K(114688K),
3.4757402 secs]90404.873: [CMS95947.013: [CMS-concurrent-preclean: 6187.143/6199.426 secs]
(concurrent mode failure): 763375K->507358K(917504K), 5560.5453822
secs] 868980K->507358K(1032192K) Heap after GC invocations=6437:
Heap
par new generation total 114688K, used 0K [0xb5400000, 0xbd400000, 0xbd400000)
eden space 98304K, 0% used [0xb5400000, 0xb5400000, 0xbb400000)
from space 16384K, 0% used [0xbb400000, 0xbb400000, 0xbc400000)
to space 16384K, 0% used [0xbc400000, 0xbc400000, 0xbd400000)
concurrent mark-sweep generation total 917504K, used 507358K
[0xbd400000, 0xf5400000, 0xf5400000)
concurrent-mark-sweep perm gen total 51336K, used 30758K [0xf5400000, 0xf8622000, 0xf9400000) } , 5564.0254377 secs]
The pattern in one of the gc logs running with -XX:PrintCMSStatistics=2
shows the re-scan timings spiking.
At the time of the long prelcean there are 833857 cards rescaned.
113684.645: [ParNew (promotion failed): 107519K->107519K(114688K), 1.0264022 secs]113685.671: [CMS (cardTable: 0 cards, re-scanned 833857 cards, 3 iterations)
Looks like a lot of cards re-scanned and just after it the number drops substantially.The one before it is ~4000.
I can see the ramp up where pre-clean was taking 2-8 seconds then starts to go up to the high teens and then some 40-60 seconds and then this one.
113684.644: [GC {Heap before GC invocations=5683:
Heap
par new generation total 114688K, used 107519K [0xb5400000, 0xbd400000, 0xbd400000)
eden space 98304K, 100% used [0xb5400000, 0xbb400000, 0xbb400000)
from space 16384K, 56% used [0xbc400000, 0xbccffed0, 0xbd400000)
to space 16384K, 0% used [0xbb400000, 0xbb400000, 0xbc400000)
concurrent mark-sweep generation total 917504K, used 809255K [0xbd400000, 0xf5400000, 0xf5400000)
concurrent-mark-sweep perm gen total 52888K, used 31705K [0xf5400000, 0xf87a6000, 0xf9400000)
113684.645: [ParNew (promotion failed): 107519K->107519K(114688K), 1.0264022 secs]113685.671: [CMS (cardTable: 0 cards, re-scanned 833857 cards, 3 iterations)
117418.012: [CMS-concurrent-preclean: 4482.648/4494.823 secs]
(CMS-concurrent-preclean yielded 0 times)
(concurrent mode failure): 815888K->451828K(917504K), 3748.8971119 secs] 916774K->451828K(1032192K) Heap after GC invocations=5684:
Heap
par new generation total 114688K, used 0K [0xb5400000, 0xbd400000, 0xbd400000)
eden space 98304K, 0% used [0xb5400000, 0xb5400000, 0xbb400000)
from space 16384K, 0% used [0xbc400000, 0xbc400000, 0xbd400000)
to space 16384K, 0% used [0xbb400000, 0xbb400000, 0xbc400000)
concurrent mark-sweep generation total 917504K, used 451828K [0xbd400000, 0xf5400000, 0xf5400000)
concurrent-mark-sweep perm gen total 52888K, used 31693K [0xf5400000, 0xf87a6000, 0xf9400000)} , 3749.9312617 secs]
There is a jump from ~15 seconds to 57 to 0 then back to 15 then 4482 seconds between the CMS events.
The cards re-scanned are 156931 for the 57 second one
Drop to 1209
then the ramp up to the eventual 833857
112761.630: [CMS-concurrent-preclean: 14.293/14.561 secs]
(CMS-concurrent-preclean yielded 0 times)
112776.001: [CMS-concurrent-preclean-start]
112834.097: [CMS-concurrent-preclean: 57.002/58.097 secs]
(CMS-concurrent-preclean yielded 0 times)
112848.226: [CMS-concurrent-preclean-start]
112849.126: [CMS-concurrent-preclean: 0.650/0.900 secs]
(CMS-concurrent-preclean yielded 0 times)
112863.932: [CMS-concurrent-preclean-start]
112878.684: [CMS-concurrent-preclean: 14.478/14.751 secs]
(CMS-concurrent-preclean yielded 0 times)
112892.655: [CMS-concurrent-preclean-start]
112908.947: [CMS-concurrent-preclean: 15.788/16.292 secs]
(CMS-concurrent-preclean yielded 0 times)
112923.189: [CMS-concurrent-preclean-start]
117418.012: [CMS-concurrent-preclean: 4482.648/4494.823 secs]
(CMS-concurrent-preclean yielded 0 times)
117442.657: [CMS-concurrent-preclean-start]
117497.231: [CMS-concurrent-preclean: 54.569/54.574 secs]
(CMS-concurrent-preclean yielded 0 times)
117509.801: [CMS-concurrent-preclean-start]
117510.480: [CMS-concurrent-preclean: 0.679/0.680 secs]
(CMS-concurrent-preclean yielded 0 times)
117521.990: [CMS-concurrent-preclean-start]
117522.484: [CMS-concurrent-preclean: 0.493/0.494 secs]
(CMS-concurrent-preclean yielded 0 times)
117533.450: [CMS-concurrent-preclean-start]
117533.943: [CMS-concurrent-preclean: 0.493/0.494 secs]
The rate of card rescans, there's a big drop
here. Before that, it's typically (less than) 1 second per
1000 dirty cards scanned. Then suddenly for this particular
event it jumps up to about 5 times that value, some 5 seconds
per 1000 cards scanned.
Customer has sent in SAR data and there doesn't appear to be
a burden on the system and paging/swap doesn't seem to
play a role in the impact of this behavior.
            
(usually 14-24 hours)
They have a pre-clean that takes quite some time. We've seen from a few hundred seconds up to 6000+
GC options: -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled
-XX:MaxNewSize=128m -XX:NewSize=128m -Xms1G -Xmx1G -XX:SurvivorRatio=6
-XX:MaxTenuringThreshold=4 -XX:CMSInitiatingOccupancyFraction=60
-Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFFE
-Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFFE
-XX:+DisableExplicitGC -XX:+UseTLAB -verbose:gc
-XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xloggc:gc
an example of a 6000+ preclean:
90401.397: [GC {Heap before GC invocations=6436:
Heap
par new generation total 114688K, used 107474K [0xb5400000, 0xbd400000, 0xbd400000)
eden space 98304K, 100% used [0xb5400000, 0xbb400000, 0xbb400000)
from space 16384K, 55% used [0xbb400000, 0xbbcf4b70, 0xbc400000)
to space 16384K, 0% used [0xbc400000, 0xbc400000, 0xbd400000)
concurrent mark-sweep generation total 917504K, used 761505K [0xbd400000, 0xf5400000, 0xf5400000)
concurrent-mark-sweep perm gen total 51336K, used 30768K [0xf5400000, 0xf8622000, 0xf9400000)
90401.397: [ParNew (promotion failed): 107474K->107474K(114688K),
3.4757402 secs]90404.873: [CMS95947.013: [CMS-concurrent-preclean: 6187.143/6199.426 secs]
(concurrent mode failure): 763375K->507358K(917504K), 5560.5453822
secs] 868980K->507358K(1032192K) Heap after GC invocations=6437:
Heap
par new generation total 114688K, used 0K [0xb5400000, 0xbd400000, 0xbd400000)
eden space 98304K, 0% used [0xb5400000, 0xb5400000, 0xbb400000)
from space 16384K, 0% used [0xbb400000, 0xbb400000, 0xbc400000)
to space 16384K, 0% used [0xbc400000, 0xbc400000, 0xbd400000)
concurrent mark-sweep generation total 917504K, used 507358K
[0xbd400000, 0xf5400000, 0xf5400000)
concurrent-mark-sweep perm gen total 51336K, used 30758K [0xf5400000, 0xf8622000, 0xf9400000) } , 5564.0254377 secs]
The pattern in one of the gc logs running with -XX:PrintCMSStatistics=2
shows the re-scan timings spiking.
At the time of the long prelcean there are 833857 cards rescaned.
113684.645: [ParNew (promotion failed): 107519K->107519K(114688K), 1.0264022 secs]113685.671: [CMS (cardTable: 0 cards, re-scanned 833857 cards, 3 iterations)
Looks like a lot of cards re-scanned and just after it the number drops substantially.The one before it is ~4000.
I can see the ramp up where pre-clean was taking 2-8 seconds then starts to go up to the high teens and then some 40-60 seconds and then this one.
113684.644: [GC {Heap before GC invocations=5683:
Heap
par new generation total 114688K, used 107519K [0xb5400000, 0xbd400000, 0xbd400000)
eden space 98304K, 100% used [0xb5400000, 0xbb400000, 0xbb400000)
from space 16384K, 56% used [0xbc400000, 0xbccffed0, 0xbd400000)
to space 16384K, 0% used [0xbb400000, 0xbb400000, 0xbc400000)
concurrent mark-sweep generation total 917504K, used 809255K [0xbd400000, 0xf5400000, 0xf5400000)
concurrent-mark-sweep perm gen total 52888K, used 31705K [0xf5400000, 0xf87a6000, 0xf9400000)
113684.645: [ParNew (promotion failed): 107519K->107519K(114688K), 1.0264022 secs]113685.671: [CMS (cardTable: 0 cards, re-scanned 833857 cards, 3 iterations)
117418.012: [CMS-concurrent-preclean: 4482.648/4494.823 secs]
(CMS-concurrent-preclean yielded 0 times)
(concurrent mode failure): 815888K->451828K(917504K), 3748.8971119 secs] 916774K->451828K(1032192K) Heap after GC invocations=5684:
Heap
par new generation total 114688K, used 0K [0xb5400000, 0xbd400000, 0xbd400000)
eden space 98304K, 0% used [0xb5400000, 0xb5400000, 0xbb400000)
from space 16384K, 0% used [0xbc400000, 0xbc400000, 0xbd400000)
to space 16384K, 0% used [0xbb400000, 0xbb400000, 0xbc400000)
concurrent mark-sweep generation total 917504K, used 451828K [0xbd400000, 0xf5400000, 0xf5400000)
concurrent-mark-sweep perm gen total 52888K, used 31693K [0xf5400000, 0xf87a6000, 0xf9400000)} , 3749.9312617 secs]
There is a jump from ~15 seconds to 57 to 0 then back to 15 then 4482 seconds between the CMS events.
The cards re-scanned are 156931 for the 57 second one
Drop to 1209
then the ramp up to the eventual 833857
112761.630: [CMS-concurrent-preclean: 14.293/14.561 secs]
(CMS-concurrent-preclean yielded 0 times)
112776.001: [CMS-concurrent-preclean-start]
112834.097: [CMS-concurrent-preclean: 57.002/58.097 secs]
(CMS-concurrent-preclean yielded 0 times)
112848.226: [CMS-concurrent-preclean-start]
112849.126: [CMS-concurrent-preclean: 0.650/0.900 secs]
(CMS-concurrent-preclean yielded 0 times)
112863.932: [CMS-concurrent-preclean-start]
112878.684: [CMS-concurrent-preclean: 14.478/14.751 secs]
(CMS-concurrent-preclean yielded 0 times)
112892.655: [CMS-concurrent-preclean-start]
112908.947: [CMS-concurrent-preclean: 15.788/16.292 secs]
(CMS-concurrent-preclean yielded 0 times)
112923.189: [CMS-concurrent-preclean-start]
117418.012: [CMS-concurrent-preclean: 4482.648/4494.823 secs]
(CMS-concurrent-preclean yielded 0 times)
117442.657: [CMS-concurrent-preclean-start]
117497.231: [CMS-concurrent-preclean: 54.569/54.574 secs]
(CMS-concurrent-preclean yielded 0 times)
117509.801: [CMS-concurrent-preclean-start]
117510.480: [CMS-concurrent-preclean: 0.679/0.680 secs]
(CMS-concurrent-preclean yielded 0 times)
117521.990: [CMS-concurrent-preclean-start]
117522.484: [CMS-concurrent-preclean: 0.493/0.494 secs]
(CMS-concurrent-preclean yielded 0 times)
117533.450: [CMS-concurrent-preclean-start]
117533.943: [CMS-concurrent-preclean: 0.493/0.494 secs]
The rate of card rescans, there's a big drop
here. Before that, it's typically (less than) 1 second per
1000 dirty cards scanned. Then suddenly for this particular
event it jumps up to about 5 times that value, some 5 seconds
per 1000 cards scanned.
Customer has sent in SAR data and there doesn't appear to be
a burden on the system and paging/swap doesn't seem to
play a role in the impact of this behavior.