-
Bug
-
Resolution: Not an Issue
-
P3
-
None
-
1.4.2_09
-
sparc
-
solaris_9
In WS6.0 with 1.4.2_09 an HashMap is loaded with 200 items at startup (in syncronized code) and then accessed by multiple threads with high ratio (formatCell is called 600 times/sec) in unsyncronized code.
See source code in http://balabiot.italy/calls/37570730_borsait/0304/src/FieldFormatter.java
After a few week it happens that some threads use all the cpu running get():
http://balabiot.italy/calls/37570730_borsait/1004/prstat.out:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
4384 www 2911M 2604M cpu0 20 10 0:16:51 27% webservd/112
4384 www 2911M 2604M run 20 10 0:16:51 24% webservd/157
4384 www 2911M 2604M run 21 10 0:44:56 23% webservd/111
4384 www 2911M 2604M sleep 29 10 0:00:02 1.3% webservd/177
We have asked to collect when it happens prstat, multiple pstack, pmap and a gcore and then we extracted via SA scripts java dump, heap stats, class histogram, java flags.
No truss available.
/net/balabiot.italy/calls/37570730_borsait/0304
/net/balabiot.italy/calls/37570730_borsait/1004
http://balabiot.italy/calls/37570730_borsait/1004/core.4384.mwsbit03.jstack
Deadlock Detection:
No deadlocks found.
....
Thread t@111: (state = IN_JAVA, current Java SP = null
)
- java.util.HashMap.get(java.lang.Object) @bci=72, line=325,
pc=0xf8d32788, methodOop=0xf0063b10 (Compiled frame; information may be
imprecise)
- com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String,
java.lang.String, java.util.Locale) @bci=125, line=248, pc=0xf8903ac8,
methodOop=0xf0b0dcc0 (Compiled frame)
...
Thread t@112: (state = IN_JAVA, current Java SP = null
)
- java.util.HashMap.get(java.lang.Object) @bci=6, line=317,
pc=0xf8d3264c, methodOop=0xf0063b10 (Compiled frame; information may be
imprecise)
- com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String,
java.lang.String, java.util.Locale) @bci=125, line=248, pc=0xf8903ac8,
methodOop=0xf0b0dcc0 (Compiled frame)
...
Thread t@157: (state = IN_JAVA, current Java SP = null
)
- java.util.HashMap.get(java.lang.Object) @bci=6, line=317,
pc=0xf8d3264c, methodOop=0xf0063b10 (Compiled frame; information may be
imprecise)
- com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String,
java.lang.String, java.util.Locale) @bci=125, line=248, pc=0xf8903ac8,
methodOop=0xf0b0dcc0 (Compiled frame)
The program counter are:
Thread t@111: pc=0xf8d32788
Thread t@112: pc=0xf8d3264c
Thread t@157: pc=0xf8d3264c
We see a couple of threads (112 e 157) running the same optimized native code, and exactly using the same amount of cpu time
From the sequence of 3 pstack, and looking at the first arguments only we see all the threads are running the same code with different arguments (e.g:f8d32790)
pstack1.out thread# 111: f8d32648 (4055b100, 4054a990, 4055b100, 72127600, 0, 4055b0d0)
pstack2.out thread# 111: f8d32790 (4055b100, 4054a990, 4055b100, 72127600, 0, 4055b0d0)
pstack3.out thread# 111: f8d3264c (4055b100, 4054a990, 4055b100, 72127600, 0, 4055b0d0)
pstack1.out thread# 112: f8d32790 (4e8401b0, 4e6bb890, 4e8401b0, 7212bd40, 0, 4e840180)
pstack2.out thread# 112: f8d3264c (4e8401b0, 4e6bb890, 4e8401b0, 7212bd40, 0, 4e840180)
pstack3.out thread# 112: f8d3264c (4e8401b0, 4e6bb890, 4e8401b0, 7212bd40, 0, 4e840180)
pstack1.out thread# 157: f8d32790 (4e19a448, 4e11aef0, 4e19a448, 7162c638, 0, 4e19a418)
pstack2.out thread# 157: f8d3279c (4e19a448, 4e11aef0, 4e19a448, 7162c638, 0, 4e19a418)
pstack3.out thread# 157: f8d32658 (4e19a448, 4e11aef0, 4e19a448, 7162c638, 0, 4e19a418)
In the previous set of data (http://balabiot.italy/calls/37570730_borsait/0304) 2 threads where using all the cpu running HashMap.get() and eq() which is in get implementation
Total: 69 processes, 474 lwps, load averages: 2.47, 2.77, 3.00
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
8090 www 2501M 2385M run 0 10 64:43:10 48% webservd/139
8090 www 2501M 2385M cpu1 0 10 0:55:06 48% webservd/109
4037 bitsys02 2848K 1704K sleep 59 0 0:00:03 0.2% top/1
----------------- t@139 -----------------
0xf8a1e8d0 0xf8a1e8d0 * java.util.HashMap.get(java.lang.Object) bci:72 line:325 methodOop:0xf0063b10 (Compiled frame; information may be imprecise)
0xf88652c8 0xf88652c8 * com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String, java.lang.String, java.util.Locale) bci:125 line:248 methodOop:0xf0b10c80 (Compiled frame)
...
----------------- t@109 -----------------
0xf8a1e8c8 0xf8a1e8c8 * java.util.HashMap.eq(java.lang.Object, java.lang.Object) bci:7 line:274 methodOop:0xf0063dd8 (Compiled frame; information may be imprecise)
0xf88652c8 0xf88652c8 * com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String, java.lang.String, java.util.Locale) bci:125 line:248 methodOop:0xf0b10c80 (Compiled frame)
...
http://balabiot.italy/calls/37570730_borsait/0304/trace/pstack1_dem.out
----------------- lwp# 109 / thread# 109 --------------------
f8a1e884 ???????? (f0009018, 93ecd750, fffbe4e7, 71483fe8, f0592648, 331d1c68)
----------------- lwp# 139 / thread# 139 --------------------
f8a1e8d4 ???????? (f0009018, 73f42118, fffbe4e7, 7180c7f8, f059b9f8, 36b199a0)
http://balabiot.italy/calls/37570730_borsait/0304/trace/pstack2_dem.out
----------------- lwp# 109 / thread# 109 --------------------
f8a1e884 ???????? (f0009018, 93ecd750, fffbe4e7, 71483fe8, f0592648, 331d1c68)
----------------- lwp# 139 / thread# 139 --------------------
f8a1e8c8 ???????? (f0009018, 73f42118, fffbe4e7, 7180c7f8, f059b9f8, 36b199a0)
http://balabiot.italy/calls/37570730_borsait/0304/trace/pstack3_dem.out
----------------- lwp# 109 / thread# 109 --------------------
f8a1e884 ???????? (f0009018, 93ecd750, fffbe4e7, 71483fe8, f0592648, 331d1c68)
----------------- lwp# 139 / thread# 139 --------------------
f8a1e8d4 ???????? (f0009018, 73f42118, fffbe4e7, 7180c7f8, f059b9f8, 36b199a0)
See source code in http://balabiot.italy/calls/37570730_borsait/0304/src/FieldFormatter.java
After a few week it happens that some threads use all the cpu running get():
http://balabiot.italy/calls/37570730_borsait/1004/prstat.out:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
4384 www 2911M 2604M cpu0 20 10 0:16:51 27% webservd/112
4384 www 2911M 2604M run 20 10 0:16:51 24% webservd/157
4384 www 2911M 2604M run 21 10 0:44:56 23% webservd/111
4384 www 2911M 2604M sleep 29 10 0:00:02 1.3% webservd/177
We have asked to collect when it happens prstat, multiple pstack, pmap and a gcore and then we extracted via SA scripts java dump, heap stats, class histogram, java flags.
No truss available.
/net/balabiot.italy/calls/37570730_borsait/0304
/net/balabiot.italy/calls/37570730_borsait/1004
http://balabiot.italy/calls/37570730_borsait/1004/core.4384.mwsbit03.jstack
Deadlock Detection:
No deadlocks found.
....
Thread t@111: (state = IN_JAVA, current Java SP = null
)
- java.util.HashMap.get(java.lang.Object) @bci=72, line=325,
pc=0xf8d32788, methodOop=0xf0063b10 (Compiled frame; information may be
imprecise)
- com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String,
java.lang.String, java.util.Locale) @bci=125, line=248, pc=0xf8903ac8,
methodOop=0xf0b0dcc0 (Compiled frame)
...
Thread t@112: (state = IN_JAVA, current Java SP = null
)
- java.util.HashMap.get(java.lang.Object) @bci=6, line=317,
pc=0xf8d3264c, methodOop=0xf0063b10 (Compiled frame; information may be
imprecise)
- com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String,
java.lang.String, java.util.Locale) @bci=125, line=248, pc=0xf8903ac8,
methodOop=0xf0b0dcc0 (Compiled frame)
...
Thread t@157: (state = IN_JAVA, current Java SP = null
)
- java.util.HashMap.get(java.lang.Object) @bci=6, line=317,
pc=0xf8d3264c, methodOop=0xf0063b10 (Compiled frame; information may be
imprecise)
- com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String,
java.lang.String, java.util.Locale) @bci=125, line=248, pc=0xf8903ac8,
methodOop=0xf0b0dcc0 (Compiled frame)
The program counter are:
Thread t@111: pc=0xf8d32788
Thread t@112: pc=0xf8d3264c
Thread t@157: pc=0xf8d3264c
We see a couple of threads (112 e 157) running the same optimized native code, and exactly using the same amount of cpu time
From the sequence of 3 pstack, and looking at the first arguments only we see all the threads are running the same code with different arguments (e.g:f8d32790)
pstack1.out thread# 111: f8d32648 (4055b100, 4054a990, 4055b100, 72127600, 0, 4055b0d0)
pstack2.out thread# 111: f8d32790 (4055b100, 4054a990, 4055b100, 72127600, 0, 4055b0d0)
pstack3.out thread# 111: f8d3264c (4055b100, 4054a990, 4055b100, 72127600, 0, 4055b0d0)
pstack1.out thread# 112: f8d32790 (4e8401b0, 4e6bb890, 4e8401b0, 7212bd40, 0, 4e840180)
pstack2.out thread# 112: f8d3264c (4e8401b0, 4e6bb890, 4e8401b0, 7212bd40, 0, 4e840180)
pstack3.out thread# 112: f8d3264c (4e8401b0, 4e6bb890, 4e8401b0, 7212bd40, 0, 4e840180)
pstack1.out thread# 157: f8d32790 (4e19a448, 4e11aef0, 4e19a448, 7162c638, 0, 4e19a418)
pstack2.out thread# 157: f8d3279c (4e19a448, 4e11aef0, 4e19a448, 7162c638, 0, 4e19a418)
pstack3.out thread# 157: f8d32658 (4e19a448, 4e11aef0, 4e19a448, 7162c638, 0, 4e19a418)
In the previous set of data (http://balabiot.italy/calls/37570730_borsait/0304) 2 threads where using all the cpu running HashMap.get() and eq() which is in get implementation
Total: 69 processes, 474 lwps, load averages: 2.47, 2.77, 3.00
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
8090 www 2501M 2385M run 0 10 64:43:10 48% webservd/139
8090 www 2501M 2385M cpu1 0 10 0:55:06 48% webservd/109
4037 bitsys02 2848K 1704K sleep 59 0 0:00:03 0.2% top/1
----------------- t@139 -----------------
0xf8a1e8d0 0xf8a1e8d0 * java.util.HashMap.get(java.lang.Object) bci:72 line:325 methodOop:0xf0063b10 (Compiled frame; information may be imprecise)
0xf88652c8 0xf88652c8 * com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String, java.lang.String, java.util.Locale) bci:125 line:248 methodOop:0xf0b10c80 (Compiled frame)
...
----------------- t@109 -----------------
0xf8a1e8c8 0xf8a1e8c8 * java.util.HashMap.eq(java.lang.Object, java.lang.Object) bci:7 line:274 methodOop:0xf0063dd8 (Compiled frame; information may be imprecise)
0xf88652c8 0xf88652c8 * com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String, java.lang.String, java.util.Locale) bci:125 line:248 methodOop:0xf0b10c80 (Compiled frame)
...
http://balabiot.italy/calls/37570730_borsait/0304/trace/pstack1_dem.out
----------------- lwp# 109 / thread# 109 --------------------
f8a1e884 ???????? (f0009018, 93ecd750, fffbe4e7, 71483fe8, f0592648, 331d1c68)
----------------- lwp# 139 / thread# 139 --------------------
f8a1e8d4 ???????? (f0009018, 73f42118, fffbe4e7, 7180c7f8, f059b9f8, 36b199a0)
http://balabiot.italy/calls/37570730_borsait/0304/trace/pstack2_dem.out
----------------- lwp# 109 / thread# 109 --------------------
f8a1e884 ???????? (f0009018, 93ecd750, fffbe4e7, 71483fe8, f0592648, 331d1c68)
----------------- lwp# 139 / thread# 139 --------------------
f8a1e8c8 ???????? (f0009018, 73f42118, fffbe4e7, 7180c7f8, f059b9f8, 36b199a0)
http://balabiot.italy/calls/37570730_borsait/0304/trace/pstack3_dem.out
----------------- lwp# 109 / thread# 109 --------------------
f8a1e884 ???????? (f0009018, 93ecd750, fffbe4e7, 71483fe8, f0592648, 331d1c68)
----------------- lwp# 139 / thread# 139 --------------------
f8a1e8d4 ???????? (f0009018, 73f42118, fffbe4e7, 7180c7f8, f059b9f8, 36b199a0)