Loading...

Type: Bug
Resolution: Duplicate
Priority: P4
Fix Version/s: None
Affects Version/s: 1.4.1
Component/s: hotspot
Labels:
- webbug

Subcomponent:
gc
CPU:

x86
OS:

linux

Name: jl125535 Date: 01/14/2003

FULL PRODUCT VERSION :
java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)

FULL OPERATING SYSTEM VERSION :
Redhat 7.3, glibc 2.2.5, kernel 2.4.18

ADDITIONAL OPERATING SYSTEMS :

EXTRA RELEVANT SYSTEM CONFIGURATION :
This happens on an Dual CPU Intel Xeon 2.4GHz with
Hyperthreading turned on. We have seen the problem in
other machines also, but it is easiest to reproduce on a
machine with Hyperthreading.

A DESCRIPTION OF THE PROBLEM :
When running multithreaded programs which create a lot of
garbage with the -XX:+UseConcMarkSweepGC flag, the JVM
locks up (becomes very very slow, but the program still
seems to make progress). This happens most frequently with
a Hyperthreaded Dual processor machine, but we have seen
this in other machines also.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Run the program with the following options:
java -server -Xmn16M -Xms80M -Xmx80M
-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails
-XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps
ConcurrentGCProblem

2. Monitor the CPU usage using "vmstat 1"
Wait for 1 or 2 CMS garbage collections to occur, and
suddenly the CPU usage becomes very low (less than 5%).


EXPECTED VERSUS ACTUAL BEHAVIOR :
Wait for 1 or 2 CMS garbage collections to occur, and
suddenly the CPU usage becomes very low (less than 5%).
This happens only with the Concurrent GC turned on.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Here is the output I get. After the last printout (which looks a little
abnormal because there is not CMS-start etc.), the CPU usage goes to less than
5%
===========================================================
2e-06: [GC 0.000223: [DefNew
Desired survivor size 819200 bytes, new threshold 1 (max 31)
- age 1: 1638392 bytes, 1638392 total
: 13183K->1599K(14784K), 0.2473340 secs] 13183K->12065K(80320K), 0.2475070
secs]
0.605896: [GC 0.606014: [DefNew
Desired survivor size 819200 bytes, new threshold 1 (max 31)
- age 1: 1638400 bytes, 1638400 total
: 14783K->1600K(14784K), 0.2707170 secs] 25249K->24091K(80320K), 0.2708460
secs]
1.28198: [GC 1.28209: [DefNew
Desired survivor size 819200 bytes, new threshold 1 (max 31)
- age 1: 1638400 bytes, 1638400 total
: 14783K->1600K(14784K), 0.2862180 secs] 37275K->36563K(80320K), 0.2863430
secs]
1.89358: [GC 1.89366: [DefNew
Desired survivor size 819200 bytes, new threshold 1 (max 31)
- age 1: 1638400 bytes, 1638400 total
: 14783K->1600K(14784K), 0.2903200 secs] 49747K->49136K(80320K), 0.2904670
secs]
2.18609: [GC [1 CMS-initial-mark: 47536K(65536K)] 49164K(80320K), 0.0095590
secs]
2.19582: [CMS-concurrent-mark-start]
======++++ size = 432343
2.69469: [CMS-concurrent-mark: 0.498/0.499 secs]
2.69478: [CMS-concurrent-preclean-start]
2.69481: [CMS-concurrent-preclean: 0.000/0.000 secs]
2.71154: [GC2.71158: [dirty card accumulation, 0.0003120 secs]2.71193: [dirty
card rescan, 0.0103110 secs]2.72229: [remark from roots, 0.0865510
secs]2.80891: [weak refs processing, 0.0000310 secs] [1 CMS-remark:
47536K(65536K)] 59224K(80320K), 0.0974280 secs]
2.80918: [CMS-concurrent-sweep-start]
2.97609: [CMS-concurrent-sweep: 0.164/0.167 secs]
2.97647: [CMS-concurrent-reset-start]
2.99376: [CMS-concurrent-reset: 0.017/0.017 secs]
3.39866: [GC 3.39881: [DefNew
Desired survivor size 819200 bytes, new threshold 1 (max 31)
- age 1: 1638400 bytes, 1638400 total
: 14784K->1600K(14784K), 0.1437860 secs] 59597K->51252K(80320K), 0.1439460
secs]
3.54648: [GC [1 CMS-initial-mark: 49652K(65536K)] 51266K(80320K), 0.0063560
secs]
3.55297: [CMS-concurrent-mark-start]
4.06014: [CMS-concurrent-mark: 0.505/0.507 secs]
4.06023: [CMS-concurrent-preclean-start]
4.06025: [CMS-concurrent-preclean: 0.000/0.000 secs]
4.08755: [GC4.0876: [dirty card accumulation, 0.0002860 secs]4.08791: [dirty
card rescan, 0.0113610 secs]4.09933: [remark from roots, 0.0358710
secs]4.13527: [weak refs processing, 0.0000400 secs] [1 CMS-remark:
49652K(65536K)] 53036K(80320K), 0.0477920 secs]
4.13552: [CMS-concurrent-sweep-start]
4.54434: [CMS-concurrent-sweep: 0.405/0.409 secs]
4.54446: [CMS-concurrent-reset-start]
4.55817: [CMS-concurrent-reset: 0.014/0.014 secs]
======++++ size = 106
7.48048: [GC 7.48057: [DefNew: 14784K->14784K(14784K), 0.0000400 secs]7.48064:
[CMS7.77193: [dirty card accumulation, 0.0003070 secs]7.77231: [dirty card
rescan, 0.0028040 secs]7.77517: [remark from roots, 0.0729060 secs]7.84815:
[weak refs processing, 0.0000400 secs]: 33821K->33821K(65536K), 0.4708740
secs] 48605K->48605K(80320K), 0.4711200 secs]

==========================================================
Here is the output I obtain when I send a SIGQUIT to the process. All threads
are waiting for monitors!
==========================================================

Full thread dump Java HotSpot(TM) Server VM (1.4.1_01-b01 mixed mode):

"DestroyJavaVM" prio=1 tid=0x0x39407940 nid=0x4d05 waiting on condition
[0..7fffd2c0]

"Thread-20" prio=1 tid=0x0x39406e48 nid=0x4d24 waiting for monitor entry
[39b0b000..39b0b840]
        at java.util.HashMap.addEntry(HashMap.java:719)
        at java.util.HashMap.put(HashMap.java:388)
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - locked <0x3078a110> (a java.util.HashMap)

"Thread-19" prio=1 tid=0x0x39406350 nid=0x4d23 waiting for monitor entry
[39a8a000..39a8a840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30796e08> (a java.util.HashMap)

"Thread-18" prio=1 tid=0x0x39405858 nid=0x4d22 waiting for monitor entry
[39a09000..39a09840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30797f60> (a java.util.HashMap)

"Thread-17" prio=1 tid=0x0x39404d60 nid=0x4d21 waiting for monitor entry
[39988000..39988840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30797240> (a java.util.HashMap)

"Thread-16" prio=1 tid=0x0x39404268 nid=0x4d20 waiting for monitor entry
[39907000..39907840]
        at java.util.HashMap.addEntry(HashMap.java:719)
        at java.util.HashMap.put(HashMap.java:388)
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - locked <0x30796ef8> (a java.util.HashMap)

"Thread-15" prio=1 tid=0x0x39403770 nid=0x4d1f waiting for monitor entry
[39886000..39886840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x3078a160> (a java.util.HashMap)

"Thread-14" prio=1 tid=0x0x39402c78 nid=0x4d1e waiting for monitor entry
[39805000..39805840]
        at java.util.HashMap.addEntry(HashMap.java:719)
        at java.util.HashMap.put(HashMap.java:388)
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - locked <0x30796ca0> (a java.util.HashMap)

"Thread-13" prio=1 tid=0x0x39402180 nid=0x4d1d waiting for monitor entry
[39784000..39784840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:25)
        - waiting to lock <0x3078a0e8> (a java.util.HashMap)

"Thread-12" prio=1 tid=0x0x39401688 nid=0x4d1c runnable [39703000..39703840]
        at java.util.HashMap.addEntry(HashMap.java:719)
        at java.util.HashMap.put(HashMap.java:388)
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - locked <0x3078a0e8> (a java.util.HashMap)

"Thread-11" prio=1 tid=0x0x39400b90 nid=0x4d1b waiting for monitor entry
[39682000..39682840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x307973d0> (a java.util.HashMap)

"Thread-10" prio=1 tid=0x0x80f3c98 nid=0x4d1a waiting for monitor entry
[39601000..39601840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30797e48> (a java.util.HashMap)

"Thread-9" prio=1 tid=0x0x80f2fd8 nid=0x4d19 waiting for monitor entry
[39580000..39580840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:25)
        - waiting to lock <0x30796ef8> (a java.util.HashMap)

"Thread-8" prio=1 tid=0x0x80f2318 nid=0x4d18 waiting for monitor entry
[393b8000..393b8840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30796c00> (a java.util.HashMap)

"Thread-7" prio=1 tid=0x0x80f1418 nid=0x4d17 waiting for monitor entry
[39337000..39337840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30796e58> (a java.util.HashMap)

"Thread-6" prio=1 tid=0x0x80f0798 nid=0x4d16 waiting for monitor entry
[392b6000..392b6840]
        at java.util.HashMap.addEntry(HashMap.java:719)
        at java.util.HashMap.put(HashMap.java:388)
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - locked <0x307971f0> (a java.util.HashMap)

"Thread-5" prio=1 tid=0x0x80efc18 nid=0x4d15 waiting for monitor entry
[39235000..39235840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30797218> (a java.util.HashMap)

"Thread-4" prio=1 tid=0x0x80ef098 nid=0x4d14 waiting for monitor entry
[391b4000..391b4840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30796db8> (a java.util.HashMap)

"Thread-3" prio=1 tid=0x0x80ee500 nid=0x4d13 waiting for monitor entry
[39133000..39133840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30796fe8> (a java.util.HashMap)

"Thread-2" prio=1 tid=0x0x80ed998 nid=0x4d12 waiting for monitor entry
[390b2000..390b2840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30797d80> (a java.util.HashMap)

"Thread-1" prio=1 tid=0x0x80ecde8 nid=0x4d11 waiting for monitor entry
[39031000..39031840]
        at ConcurrentGCProblem$Worker.run(ConcurrentGCProblem.java:27)
        - waiting to lock <0x30796d90> (a java.util.HashMap)

"Signal Dispatcher" daemon prio=1 tid=0x0x80bb468 nid=0x4d0e waiting on
condition [0..0]

"Surrogate Locker Thread (CMS)" daemon prio=1 tid=0x0x80ba7b0 nid=0x4d0b
runnable [0..2b1939f8]

"Finalizer" daemon prio=1 tid=0x0x80b56b8 nid=0x4d0a in Object.wait()
[388c1000..388c1840]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x3078c0b0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
        - locked <0x3078c0b0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=1 tid=0x0x80b43a0 nid=0x4d09 in Object.wait()
[38840000..38840840]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x3078c050> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:426)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:113)
        - locked <0x3078c050> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=1 tid=0x0x80b3948 nid=0x4d08 runnable

"VM Periodic Task Thread" prio=1 tid=0x0x80bb2c8 nid=0x4d0c waiting on
condition
"Suspend Checker Thread" prio=1 tid=0x0x80bb360 nid=0x4d0d runnable

REPRODUCIBILITY :
This bug can be reproduced often.

---------- BEGIN SOURCE ----------

import java.util.*;

public class ConcurrentGCProblem
{
public static final int TARGET_SIZE = 100000;

static final class Worker extends Thread {
Worker(boolean print) { print_ = print; }
private final boolean print_;
public void run() {
int dataSize = 0;
int count = 0;
while (true) {
Map m = maps_[(int)(Math.random() * maps_.length)];
boolean add;
double diff = (double)(dataSize-TARGET_SIZE);
if (diff < 0) {
diff = -diff;
add = (2 * Math.random() + diff/TARGET_SIZE > 1);
}
else {
add = (2 * Math.random() + diff/TARGET_SIZE < 1);
}
synchronized (m) {
if (add) {
m.put(new Double(Math.random()),
                              new int[(int)(Math.random()*32)]);
dataSize++;
}
else
if (!m.isEmpty()) {
Iterator iter =
m.keySet().iterator();
iter.next();
iter.remove();
dataSize--;
}
}
if (++count % 25000 == 0) {
dataSize = 0;
for (int i = 0; i<maps_.length; i++) {
synchronized (maps_[i]) {
dataSize += maps_[i].size();
}
}
if (print_)
System.out.println("======++++size = " + dataSize);
}
}
}
}

static final Map[] maps_ = new Map[100];

public static void main(String[] args) {
for (int i = 0; i<maps_.length; i++)
maps_[i] = new HashMap();

for (int i = 0; i<20; i++)
(new Worker(i==0)).start();
}
}

---------- END SOURCE ----------

CUSTOMER WORKAROUND :
By playing with the memory allocations, you can reduce how
often the bug happens. But I have not systematic way the
work around this, other than not using
-XX:+UseConcMarkSweepGC
(Review ID: 178997)
======================================================================

duplicates

JDK-4758307 Concurrent Garbage Collector doesn't scale

Closed

Details

Description

Attachments

Issue Links

Activity

People

Dates