-
Bug
-
Resolution: Incomplete
-
P4
-
None
-
8u171
-
x86_64
-
linux
ADDITIONAL SYSTEM INFORMATION :
Hardware:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
Stepping: 1
CPU MHz: 2729.906
CPU max MHz: 3400.0000
CPU min MHz: 1200.0000
BogoMIPS: 4802.95
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
OS:
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
JAVA:
# JRE version: OpenJDK Runtime Environment (8.0_171-b11) (build 1.8.0_171-8u171-b11-1~bpo8+1-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.171-b11 mixed mode linux-amd64 )
A DESCRIPTION OF THE PROBLEM :
When a cron job which runs smartctrl every 12 hour, which causes high IO uage, we see the JVM crashes. The error log is here.
--------------- T H R E A D ---------------
Current thread (0x00007f9400160800): ConcurrentGCThread [stack: 0x00007f60f73de000,0x00007f60f74df000] [id=12085]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000008
Registers:
RAX=0x00007f94065f47d9, RBX=0x00007f7ebd7f0018, RCX=0x0000000000000004, RDX=0x0000000000000000
RSP=0x00007f60f74ddc40, RBP=0x00007f60f74ddc70, RSI=0x00007f65e4000000, RDI=0x0000000000000000
R8 =0x00007f60fbffd000, R9 =0x0000000000000010, R10=0x000000000c6cbf80, R11=0x0000000030c898e5
R12=0x00007f60f74ddcd0, R13=0x00007f93e4000000, R14=0x00007f940004e0b0, R15=0x00007f60f74ddd80
RIP=0x00007f9405b46ef1, EFLAGS=0x0000000000010246, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
TRAPNO=0x000000000000000e
Top of Stack: (sp=0x00007f60f74ddc40)
0x00007f60f74ddc40: 00007f940651e130 000000000000002c
0x00007f60f74ddc50: 42c72bef42c48af0 00007f7ebd7f0018
0x00007f60f74ddc60: 00007f60f74ddcd0 00007f93e4000000
0x00007f60f74ddc70: 00007f60f74ddca0 00007f9405af4cbd
0x00007f60f74ddc80: 00007f940004dd90 00007f94001540e0
0x00007f60f74ddc90: 00007f940004dd90 0000000000000001
0x00007f60f74ddca0: 00007f60f74ddd50 00007f9405b40a01
0x00007f60f74ddcb0: 00007f94065f457c 42c72bef4200c66d
0x00007f60f74ddcc0: 00007f60f74ddd00 00007f9400154770
0x00007f60f74ddcd0: 00007f9406527c70 00007f94001540e0
0x00007f60f74ddce0: 00007f940004dd90 00007f940004dfa0
0x00007f60f74ddcf0: 00007f93e4000000 00007f940004e0b0
0x00007f60f74ddd00: 00007f9400154428 00007f9401000000
0x00007f60f74ddd10: 00007f7ebd7e8eb8 00007f9400160800
0x00007f60f74ddd20: 00007f94001542a0 bcbf739fb51ed500
0x00007f60f74ddd30: 00007f60f74ddd50 00007f94001540e0
0x00007f60f74ddd40: 00007f9400154288 00007f94001542a0
0x00007f60f74ddd50: 00007f60f74dde30 00007f9405b4ac8d
0x00007f60f74ddd60: 00007f60f74ddd90 00007f60f74ddd90
0x00007f60f74ddd70: 00007f60f74dddc0 00007f9400154b90
0x00007f60f74ddd80: 0101000101000001 000000000000000c
0x00007f60f74ddd90: 00007f60f74d0101 41053ded70a3d70a
0x00007f60f74ddda0: 40ac6f051eb851ec 417af3a9ed1eb852
0x00007f60f74dddb0: 00007f9400009ce0 417af3a8b8000000
0x00007f60f74dddc0: 00007f94001540e0 00007f940615fa02
0x00007f60f74dddd0: 0000000000000000 00002653fddce139
0x00007f60f74ddde0: 00007f60f74dde01 0000029705f23e00
0x00007f60f74dddf0: 00007f940000f030 bcbf739fb51ed500
0x00007f60f74dde00: 00007f60f74dde30 00007f94001540e0
0x00007f60f74dde10: 00007f94065f9f20 00007f94001547e0
0x00007f60f74dde20: 00007f94065c575b 00007f94065f9c2c
0x00007f60f74dde30: 00007f60f74ddee0 00007f9405b507e6
Instructions: (pc=0x00007f9405b46ef1)
0x00007f9405b46ed1: c1 49 d3 e1 4f 85 0c d0 0f 85 91 00 00 00 48 8d
0x00007f9405b46ee1: 05 f3 d8 aa 00 0f b6 10 84 d2 75 4b 48 8b 7b 08
0x00007f9405b46ef1: 8b 47 08 83 f8 00 7e 5a a8 01 75 63 c1 f8 03 48
0x00007f9405b46f01: 8d 0d 51 c9 aa 00 48 8d 15 62 de a7 00 48 98 48
Register to memory mapping:
RAX=0x00007f94065f47d9: <offset 0xf757d9> in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so at 0x00007f940567f000
The error message is saying âSegment Violationâ at the address 0x0000000000000008. You can find all the error message from https://code.uberinternal.com/P159571
And the CPU register PC is pointing at 0x00007f9405b46ef1
Read assembly at the address 0x00007f9405b46ef1
(gdb) disas 0x00007f9405b46ef1
Dump of assembler code for function SweepClosure::do_live_chunk(FreeChunk*):
0x00007f9405b46e80 <+0>: push %rbp
0x00007f9405b46e81 <+1>: mov %rsp,%rbp
0x00007f9405b46e84 <+4>: push %r13
0x00007f9405b46e86 <+6>: push %r12
0x00007f9405b46e88 <+8>: push %rbx
0x00007f9405b46e89 <+9>: mov %rdi,%r12
0x00007f9405b46e8c <+12>: mov %rsi,%rbx
0x00007f9405b46e8f <+15>: sub $0x18,%rsp
0x00007f9405b46e93 <+19>: cmpb $0x0,0x38(%rdi)
0x00007f9405b46e97 <+23>: jne 0x7f9405b47040 <SweepClosure::do_live_chunk(FreeChunk*)+448>
0x00007f9405b46e9d <+29>: mov 0x30(%r12),%rdx
0x00007f9405b46ea2 <+34>: lea 0x8(%rbx),%rax
0x00007f9405b46ea6 <+38>: mov $0x1,%r9d
0x00007f9405b46eac <+44>: mov (%rdx),%rsi
0x00007f9405b46eaf <+47>: mov 0x10(%rdx),%edi
0x00007f9405b46eb2 <+50>: mov 0xa8(%rdx),%r8
0x00007f9405b46eb9 <+57>: sub %rsi,%rax
0x00007f9405b46ebc <+60>: mov %edi,%ecx
0x00007f9405b46ebe <+62>: shr $0x3,%rax
0x00007f9405b46ec2 <+66>: shr %cl,%rax
0x00007f9405b46ec5 <+69>: mov %rax,%r10
0x00007f9405b46ec8 <+72>: and $0x3f,%eax
0x00007f9405b46ecb <+75>: shr $0x6,%r10
0x00007f9405b46ecf <+79>: mov %rax,%rcx
0x00007f9405b46ed2 <+82>: shl %cl,%r9
0x00007f9405b46ed5 <+85>: test %r9,(%r8,%r10,8)
0x00007f9405b46ed9 <+89>: jne 0x7f9405b46f70 <SweepClosure::do_live_chunk(FreeChunk*)+240>
0x00007f9405b46edf <+95>: lea 0xaad8f3(%rip),%rax # 0x7f94065f47d9 <UseCompressedClassPointers>
0x00007f9405b46ee6 <+102>: movzbl (%rax),%edx
0x00007f9405b46ee9 <+105>: test %dl,%dl
0x00007f9405b46eeb <+107>: jne 0x7f9405b46f38 <SweepClosure::do_live_chunk(FreeChunk*)+184>
0x00007f9405b46eed <+109>: mov 0x8(%rbx),%rdi
0x00007f9405b46ef1 <+113>: mov 0x8(%rdi),%eax
We can see register rdi is 0.
(gdb) info registers
rax 0x0 0
rbx 0x7f7ebd7f0018 140182321823768
rcx 0x3 3
rdx 0x0 0
rsi 0x7f7ebd7f0018 140182321823768
rdi 0x0 0
rbp 0x7f60f74dcc30 0x7f60f74dcc30
From the assembly, we can see rdi is from the memory pointed by 0x8(%rbx) and rbx is from rsi which is the passed-in parameter FreeChunk *fc.
#15 0x00007f9405b46ef1 in SweepClosure::do_live_chunk(FreeChunk*) (this=0x7f60f74ddcd0, fc=0x7f7ebd7f0018) at /srv/jdk/openjdk-8-8u171-b11/src/hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp:8602
But all the values on the memory of pointer fc pointing at are 0s.
gdb) x /16b 0x7f7ebd7f0018
0x7f7ebd7f0018: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7f7ebd7f0020: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
In the source code,
SweepClosure::do_live_chunk() calls inline method oopDesc::size() and then calls klass() which cause the exception.
inline Klass* oopDesc::klass() const {
if (UseCompressedClassPointers) {
return Klass::decode_klass_not_null(_metadata._compressed_klass);
} else {
return _metadata._klass;
}
}
(gdb) p *(struct oopDesc *) 0x7f7ebd7f0018
$5 = {_mark = 0x0, _metadata = {_klass = 0x0, _compressed_klass = 0}, static _bs = 0x7f940004a6c0}
_metadata is the second parameter of oopDesc. So _metadata address is 8(x64 word is 8) which is invalid. That is why we see the error message in the beginning with invalid address 0x0000000000000008.
However, I am not able to find out why smartctrl which is a batch tool that can cause the the issue. Please help.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
run smartctrl -x
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
not crash
ACTUAL -
crash
FREQUENCY : occasionally
Hardware:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
Stepping: 1
CPU MHz: 2729.906
CPU max MHz: 3400.0000
CPU min MHz: 1200.0000
BogoMIPS: 4802.95
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
OS:
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
JAVA:
# JRE version: OpenJDK Runtime Environment (8.0_171-b11) (build 1.8.0_171-8u171-b11-1~bpo8+1-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.171-b11 mixed mode linux-amd64 )
A DESCRIPTION OF THE PROBLEM :
When a cron job which runs smartctrl every 12 hour, which causes high IO uage, we see the JVM crashes. The error log is here.
--------------- T H R E A D ---------------
Current thread (0x00007f9400160800): ConcurrentGCThread [stack: 0x00007f60f73de000,0x00007f60f74df000] [id=12085]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000008
Registers:
RAX=0x00007f94065f47d9, RBX=0x00007f7ebd7f0018, RCX=0x0000000000000004, RDX=0x0000000000000000
RSP=0x00007f60f74ddc40, RBP=0x00007f60f74ddc70, RSI=0x00007f65e4000000, RDI=0x0000000000000000
R8 =0x00007f60fbffd000, R9 =0x0000000000000010, R10=0x000000000c6cbf80, R11=0x0000000030c898e5
R12=0x00007f60f74ddcd0, R13=0x00007f93e4000000, R14=0x00007f940004e0b0, R15=0x00007f60f74ddd80
RIP=0x00007f9405b46ef1, EFLAGS=0x0000000000010246, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
TRAPNO=0x000000000000000e
Top of Stack: (sp=0x00007f60f74ddc40)
0x00007f60f74ddc40: 00007f940651e130 000000000000002c
0x00007f60f74ddc50: 42c72bef42c48af0 00007f7ebd7f0018
0x00007f60f74ddc60: 00007f60f74ddcd0 00007f93e4000000
0x00007f60f74ddc70: 00007f60f74ddca0 00007f9405af4cbd
0x00007f60f74ddc80: 00007f940004dd90 00007f94001540e0
0x00007f60f74ddc90: 00007f940004dd90 0000000000000001
0x00007f60f74ddca0: 00007f60f74ddd50 00007f9405b40a01
0x00007f60f74ddcb0: 00007f94065f457c 42c72bef4200c66d
0x00007f60f74ddcc0: 00007f60f74ddd00 00007f9400154770
0x00007f60f74ddcd0: 00007f9406527c70 00007f94001540e0
0x00007f60f74ddce0: 00007f940004dd90 00007f940004dfa0
0x00007f60f74ddcf0: 00007f93e4000000 00007f940004e0b0
0x00007f60f74ddd00: 00007f9400154428 00007f9401000000
0x00007f60f74ddd10: 00007f7ebd7e8eb8 00007f9400160800
0x00007f60f74ddd20: 00007f94001542a0 bcbf739fb51ed500
0x00007f60f74ddd30: 00007f60f74ddd50 00007f94001540e0
0x00007f60f74ddd40: 00007f9400154288 00007f94001542a0
0x00007f60f74ddd50: 00007f60f74dde30 00007f9405b4ac8d
0x00007f60f74ddd60: 00007f60f74ddd90 00007f60f74ddd90
0x00007f60f74ddd70: 00007f60f74dddc0 00007f9400154b90
0x00007f60f74ddd80: 0101000101000001 000000000000000c
0x00007f60f74ddd90: 00007f60f74d0101 41053ded70a3d70a
0x00007f60f74ddda0: 40ac6f051eb851ec 417af3a9ed1eb852
0x00007f60f74dddb0: 00007f9400009ce0 417af3a8b8000000
0x00007f60f74dddc0: 00007f94001540e0 00007f940615fa02
0x00007f60f74dddd0: 0000000000000000 00002653fddce139
0x00007f60f74ddde0: 00007f60f74dde01 0000029705f23e00
0x00007f60f74dddf0: 00007f940000f030 bcbf739fb51ed500
0x00007f60f74dde00: 00007f60f74dde30 00007f94001540e0
0x00007f60f74dde10: 00007f94065f9f20 00007f94001547e0
0x00007f60f74dde20: 00007f94065c575b 00007f94065f9c2c
0x00007f60f74dde30: 00007f60f74ddee0 00007f9405b507e6
Instructions: (pc=0x00007f9405b46ef1)
0x00007f9405b46ed1: c1 49 d3 e1 4f 85 0c d0 0f 85 91 00 00 00 48 8d
0x00007f9405b46ee1: 05 f3 d8 aa 00 0f b6 10 84 d2 75 4b 48 8b 7b 08
0x00007f9405b46ef1: 8b 47 08 83 f8 00 7e 5a a8 01 75 63 c1 f8 03 48
0x00007f9405b46f01: 8d 0d 51 c9 aa 00 48 8d 15 62 de a7 00 48 98 48
Register to memory mapping:
RAX=0x00007f94065f47d9: <offset 0xf757d9> in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so at 0x00007f940567f000
The error message is saying âSegment Violationâ at the address 0x0000000000000008. You can find all the error message from https://code.uberinternal.com/P159571
And the CPU register PC is pointing at 0x00007f9405b46ef1
Read assembly at the address 0x00007f9405b46ef1
(gdb) disas 0x00007f9405b46ef1
Dump of assembler code for function SweepClosure::do_live_chunk(FreeChunk*):
0x00007f9405b46e80 <+0>: push %rbp
0x00007f9405b46e81 <+1>: mov %rsp,%rbp
0x00007f9405b46e84 <+4>: push %r13
0x00007f9405b46e86 <+6>: push %r12
0x00007f9405b46e88 <+8>: push %rbx
0x00007f9405b46e89 <+9>: mov %rdi,%r12
0x00007f9405b46e8c <+12>: mov %rsi,%rbx
0x00007f9405b46e8f <+15>: sub $0x18,%rsp
0x00007f9405b46e93 <+19>: cmpb $0x0,0x38(%rdi)
0x00007f9405b46e97 <+23>: jne 0x7f9405b47040 <SweepClosure::do_live_chunk(FreeChunk*)+448>
0x00007f9405b46e9d <+29>: mov 0x30(%r12),%rdx
0x00007f9405b46ea2 <+34>: lea 0x8(%rbx),%rax
0x00007f9405b46ea6 <+38>: mov $0x1,%r9d
0x00007f9405b46eac <+44>: mov (%rdx),%rsi
0x00007f9405b46eaf <+47>: mov 0x10(%rdx),%edi
0x00007f9405b46eb2 <+50>: mov 0xa8(%rdx),%r8
0x00007f9405b46eb9 <+57>: sub %rsi,%rax
0x00007f9405b46ebc <+60>: mov %edi,%ecx
0x00007f9405b46ebe <+62>: shr $0x3,%rax
0x00007f9405b46ec2 <+66>: shr %cl,%rax
0x00007f9405b46ec5 <+69>: mov %rax,%r10
0x00007f9405b46ec8 <+72>: and $0x3f,%eax
0x00007f9405b46ecb <+75>: shr $0x6,%r10
0x00007f9405b46ecf <+79>: mov %rax,%rcx
0x00007f9405b46ed2 <+82>: shl %cl,%r9
0x00007f9405b46ed5 <+85>: test %r9,(%r8,%r10,8)
0x00007f9405b46ed9 <+89>: jne 0x7f9405b46f70 <SweepClosure::do_live_chunk(FreeChunk*)+240>
0x00007f9405b46edf <+95>: lea 0xaad8f3(%rip),%rax # 0x7f94065f47d9 <UseCompressedClassPointers>
0x00007f9405b46ee6 <+102>: movzbl (%rax),%edx
0x00007f9405b46ee9 <+105>: test %dl,%dl
0x00007f9405b46eeb <+107>: jne 0x7f9405b46f38 <SweepClosure::do_live_chunk(FreeChunk*)+184>
0x00007f9405b46eed <+109>: mov 0x8(%rbx),%rdi
0x00007f9405b46ef1 <+113>: mov 0x8(%rdi),%eax
We can see register rdi is 0.
(gdb) info registers
rax 0x0 0
rbx 0x7f7ebd7f0018 140182321823768
rcx 0x3 3
rdx 0x0 0
rsi 0x7f7ebd7f0018 140182321823768
rdi 0x0 0
rbp 0x7f60f74dcc30 0x7f60f74dcc30
From the assembly, we can see rdi is from the memory pointed by 0x8(%rbx) and rbx is from rsi which is the passed-in parameter FreeChunk *fc.
#15 0x00007f9405b46ef1 in SweepClosure::do_live_chunk(FreeChunk*) (this=0x7f60f74ddcd0, fc=0x7f7ebd7f0018) at /srv/jdk/openjdk-8-8u171-b11/src/hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp:8602
But all the values on the memory of pointer fc pointing at are 0s.
gdb) x /16b 0x7f7ebd7f0018
0x7f7ebd7f0018: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7f7ebd7f0020: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
In the source code,
SweepClosure::do_live_chunk() calls inline method oopDesc::size() and then calls klass() which cause the exception.
inline Klass* oopDesc::klass() const {
if (UseCompressedClassPointers) {
return Klass::decode_klass_not_null(_metadata._compressed_klass);
} else {
return _metadata._klass;
}
}
(gdb) p *(struct oopDesc *) 0x7f7ebd7f0018
$5 = {_mark = 0x0, _metadata = {_klass = 0x0, _compressed_klass = 0}, static _bs = 0x7f940004a6c0}
_metadata is the second parameter of oopDesc. So _metadata address is 8(x64 word is 8) which is invalid. That is why we see the error message in the beginning with invalid address 0x0000000000000008.
However, I am not able to find out why smartctrl which is a batch tool that can cause the the issue. Please help.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
run smartctrl -x
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
not crash
ACTUAL -
crash
FREQUENCY : occasionally