When running our application one of our processes will periodically start
consuming a large amount of CPU on the machine. In researching this problem
we have found that one thread in the process will consume an entire CPU's
processing time. We are using the application on a 4 cpu machine.
We have reproduced this problem on
1.2.2_07
1.2.2_08
1.2.2_09
We cannot reproduce the problem on any of the implementation releases.
Looking at two comments added to bug 4326537 in August by "reck" and
"sadananda". Our stack trace information is very similar to what reck
reported (I have included a subset of that again in on lwp# 26). Reck's
comments are also interesting because that shows that the problem is
occuring in 1.3.1. Also of note we are using IONA's OrbixWeb 3.2
implementation of corba as sadananda reported, however, I do not believe
that this is a corba issue.
Our application is used in 7x24 facilitys around the world, and must have
very high reliability. This bug would be a show stopper for us to migrate to 1.3.
Truss output:
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
This of course goes on forever
pstack output
fef09bb4 __sigprocmask (fef0c4d0, 0, ffbffeff, f3641e00, f3641d78,
f3640388) + 8
fef00684 __thr_sigsetmask (1, f3640388, f3640378, fef1e000, 0, f3641e00) +
f4
fef083d8 sigacthandler (21, f3641d78, 84, f3641df0, 0, fef1e000) + 628
--- called from signal handler with signal 33 (SIGLWP) ---
fef09bb4 __sigprocmask (10, f3641d78, 40, f3641df0, 50, fef1e000) + 8
fee9a744 _read (d, f3640d70, 4, d427f0, d42870, 1043fae8) + c
ff0c3a64 JVM_Read (d, f3640d70, 4, d427f0, fffffffe, d427f0) + 2c
fe2d86b4 socketReadWork (d429d8, 0, d, 0, 0, f3641628) + 25c
fe2d877c socketReadOnStack (d429d8, f3641624, f3641628, 0, 4, 0) + 18
f5c4f894 ???????? (f7188e98, f718add0, 0, 4, d41f68, 1)
f5c5592c ???????? (f81057b8, f81057d0, 0, 4, 0, 0)
f5cceaa8 ???????? (f81057e8, f81057d0, 0, 4, 0, 0)
f5cce1cc ???????? (f81056e0, f81057d0, 0, 0, 50, 2)
ff32c448 JITInvokeCompiledMethod (d48ca0, 649d30, d427f0, 1, 1, 4) + bc
ff06ca14 invokeCompiledMethod (f3641800, 649d30, d427f0, d48c8c, d48c8c,
12f678) + 98
ff123048 executeJava (ff364a38, d427f0, d429ac, 649d30, 1, 12f678) + 2ef0
ff099ce8 do_execute_java_method_vararg_SLOW (d427f0, 26bf9, 0, d48c74, 3,
f3641c84) + 1f4
ff098f24 do_execute_java_method (d427f0, d428b8, 0, ebda8, ff31b000,
d4faa5) + ac
ff0c4e1c ThreadRT0 (d428b8, 0, ff357000, d427f0, d427f0, 0) + 148
ff129f9c _start (0, d427f0, ff31cc00, f34, ff32d000, ff36387c) + 23c
fef0bbcc _thread_start (d427f0, 0, 0, 0, 0, 0) + 40
VITALS from our box are attached.
consuming a large amount of CPU on the machine. In researching this problem
we have found that one thread in the process will consume an entire CPU's
processing time. We are using the application on a 4 cpu machine.
We have reproduced this problem on
1.2.2_07
1.2.2_08
1.2.2_09
We cannot reproduce the problem on any of the implementation releases.
Looking at two comments added to bug 4326537 in August by "reck" and
"sadananda". Our stack trace information is very similar to what reck
reported (I have included a subset of that again in on lwp# 26). Reck's
comments are also interesting because that shows that the problem is
occuring in 1.3.1. Also of note we are using IONA's OrbixWeb 3.2
implementation of corba as sadananda reported, however, I do not believe
that this is a corba issue.
Our application is used in 7x24 facilitys around the world, and must have
very high reliability. This bug would be a show stopper for us to migrate to 1.3.
Truss output:
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xF3640240, 0x00000000) = 0
17283/26: sigprocmask(SIG_SETMASK, 0xFEF2ADE0, 0xF3640240) = 0
17283/26: lwp_kill(26, SIGUSR1) = 0
This of course goes on forever
pstack output
fef09bb4 __sigprocmask (fef0c4d0, 0, ffbffeff, f3641e00, f3641d78,
f3640388) + 8
fef00684 __thr_sigsetmask (1, f3640388, f3640378, fef1e000, 0, f3641e00) +
f4
fef083d8 sigacthandler (21, f3641d78, 84, f3641df0, 0, fef1e000) + 628
--- called from signal handler with signal 33 (SIGLWP) ---
fef09bb4 __sigprocmask (10, f3641d78, 40, f3641df0, 50, fef1e000) + 8
fee9a744 _read (d, f3640d70, 4, d427f0, d42870, 1043fae8) + c
ff0c3a64 JVM_Read (d, f3640d70, 4, d427f0, fffffffe, d427f0) + 2c
fe2d86b4 socketReadWork (d429d8, 0, d, 0, 0, f3641628) + 25c
fe2d877c socketReadOnStack (d429d8, f3641624, f3641628, 0, 4, 0) + 18
f5c4f894 ???????? (f7188e98, f718add0, 0, 4, d41f68, 1)
f5c5592c ???????? (f81057b8, f81057d0, 0, 4, 0, 0)
f5cceaa8 ???????? (f81057e8, f81057d0, 0, 4, 0, 0)
f5cce1cc ???????? (f81056e0, f81057d0, 0, 0, 50, 2)
ff32c448 JITInvokeCompiledMethod (d48ca0, 649d30, d427f0, 1, 1, 4) + bc
ff06ca14 invokeCompiledMethod (f3641800, 649d30, d427f0, d48c8c, d48c8c,
12f678) + 98
ff123048 executeJava (ff364a38, d427f0, d429ac, 649d30, 1, 12f678) + 2ef0
ff099ce8 do_execute_java_method_vararg_SLOW (d427f0, 26bf9, 0, d48c74, 3,
f3641c84) + 1f4
ff098f24 do_execute_java_method (d427f0, d428b8, 0, ebda8, ff31b000,
d4faa5) + ac
ff0c4e1c ThreadRT0 (d428b8, 0, ff357000, d427f0, d427f0, 0) + 148
ff129f9c _start (0, d427f0, ff31cc00, f34, ff32d000, ff36387c) + 23c
fef0bbcc _thread_start (d427f0, 0, 0, 0, 0, 0) + 40
VITALS from our box are attached.
- relates to
-
JDK-4326537 Apparent livelock in signal delivery/handling code on Solaris 8
-
- Closed
-