Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8220216

JVM crash during during GC when the utility smartctrl is enabled

XMLWordPrintable

    • gc
    • x86_64
    • linux

      ADDITIONAL SYSTEM INFORMATION :
      Hardware:
      Architecture: x86_64
      CPU op-mode(s): 32-bit, 64-bit
      Byte Order: Little Endian
      CPU(s): 40
      On-line CPU(s) list: 0-39
      Thread(s) per core: 2
      Core(s) per socket: 10
      Socket(s): 2
      NUMA node(s): 2
      Vendor ID: GenuineIntel
      CPU family: 6
      Model: 79
      Model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
      Stepping: 1
      CPU MHz: 2729.906
      CPU max MHz: 3400.0000
      CPU min MHz: 1200.0000
      BogoMIPS: 4802.95
      Virtualization: VT-x
      L1d cache: 32K
      L1i cache: 32K
      L2 cache: 256K
      L3 cache: 25600K
      NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
      NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

      OS:
      PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
      NAME="Debian GNU/Linux"
      VERSION_ID="8"
      VERSION="8 (jessie)"

      JAVA:
      # JRE version: OpenJDK Runtime Environment (8.0_171-b11) (build 1.8.0_171-8u171-b11-1~bpo8+1-b11)
      # Java VM: OpenJDK 64-Bit Server VM (25.171-b11 mixed mode linux-amd64 )



      A DESCRIPTION OF THE PROBLEM :
      When a cron job which runs smartctrl every 12 hour, which causes high IO uage, we see the JVM crashes. The error log is here.
      --------------- T H R E A D ---------------

      Current thread (0x00007f9400160800): ConcurrentGCThread [stack: 0x00007f60f73de000,0x00007f60f74df000] [id=12085]

      siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000008

      Registers:
      RAX=0x00007f94065f47d9, RBX=0x00007f7ebd7f0018, RCX=0x0000000000000004, RDX=0x0000000000000000
      RSP=0x00007f60f74ddc40, RBP=0x00007f60f74ddc70, RSI=0x00007f65e4000000, RDI=0x0000000000000000
      R8 =0x00007f60fbffd000, R9 =0x0000000000000010, R10=0x000000000c6cbf80, R11=0x0000000030c898e5
      R12=0x00007f60f74ddcd0, R13=0x00007f93e4000000, R14=0x00007f940004e0b0, R15=0x00007f60f74ddd80
      RIP=0x00007f9405b46ef1, EFLAGS=0x0000000000010246, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
        TRAPNO=0x000000000000000e

      Top of Stack: (sp=0x00007f60f74ddc40)
      0x00007f60f74ddc40: 00007f940651e130 000000000000002c
      0x00007f60f74ddc50: 42c72bef42c48af0 00007f7ebd7f0018
      0x00007f60f74ddc60: 00007f60f74ddcd0 00007f93e4000000
      0x00007f60f74ddc70: 00007f60f74ddca0 00007f9405af4cbd
      0x00007f60f74ddc80: 00007f940004dd90 00007f94001540e0
      0x00007f60f74ddc90: 00007f940004dd90 0000000000000001
      0x00007f60f74ddca0: 00007f60f74ddd50 00007f9405b40a01
      0x00007f60f74ddcb0: 00007f94065f457c 42c72bef4200c66d
      0x00007f60f74ddcc0: 00007f60f74ddd00 00007f9400154770
      0x00007f60f74ddcd0: 00007f9406527c70 00007f94001540e0
      0x00007f60f74ddce0: 00007f940004dd90 00007f940004dfa0
      0x00007f60f74ddcf0: 00007f93e4000000 00007f940004e0b0
      0x00007f60f74ddd00: 00007f9400154428 00007f9401000000
      0x00007f60f74ddd10: 00007f7ebd7e8eb8 00007f9400160800
      0x00007f60f74ddd20: 00007f94001542a0 bcbf739fb51ed500
      0x00007f60f74ddd30: 00007f60f74ddd50 00007f94001540e0
      0x00007f60f74ddd40: 00007f9400154288 00007f94001542a0
      0x00007f60f74ddd50: 00007f60f74dde30 00007f9405b4ac8d
      0x00007f60f74ddd60: 00007f60f74ddd90 00007f60f74ddd90
      0x00007f60f74ddd70: 00007f60f74dddc0 00007f9400154b90
      0x00007f60f74ddd80: 0101000101000001 000000000000000c
      0x00007f60f74ddd90: 00007f60f74d0101 41053ded70a3d70a
      0x00007f60f74ddda0: 40ac6f051eb851ec 417af3a9ed1eb852
      0x00007f60f74dddb0: 00007f9400009ce0 417af3a8b8000000
      0x00007f60f74dddc0: 00007f94001540e0 00007f940615fa02
      0x00007f60f74dddd0: 0000000000000000 00002653fddce139
      0x00007f60f74ddde0: 00007f60f74dde01 0000029705f23e00
      0x00007f60f74dddf0: 00007f940000f030 bcbf739fb51ed500
      0x00007f60f74dde00: 00007f60f74dde30 00007f94001540e0
      0x00007f60f74dde10: 00007f94065f9f20 00007f94001547e0
      0x00007f60f74dde20: 00007f94065c575b 00007f94065f9c2c
      0x00007f60f74dde30: 00007f60f74ddee0 00007f9405b507e6

      Instructions: (pc=0x00007f9405b46ef1)
      0x00007f9405b46ed1: c1 49 d3 e1 4f 85 0c d0 0f 85 91 00 00 00 48 8d
      0x00007f9405b46ee1: 05 f3 d8 aa 00 0f b6 10 84 d2 75 4b 48 8b 7b 08
      0x00007f9405b46ef1: 8b 47 08 83 f8 00 7e 5a a8 01 75 63 c1 f8 03 48
      0x00007f9405b46f01: 8d 0d 51 c9 aa 00 48 8d 15 62 de a7 00 48 98 48

      Register to memory mapping:

      RAX=0x00007f94065f47d9: <offset 0xf757d9> in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so at 0x00007f940567f000

      The error message is saying ‘Segment Violation’ at the address 0x0000000000000008. You can find all the error message from https://code.uberinternal.com/P159571

      And the CPU register PC is pointing at 0x00007f9405b46ef1

      Read assembly at the address 0x00007f9405b46ef1

      (gdb) disas 0x00007f9405b46ef1
      Dump of assembler code for function SweepClosure::do_live_chunk(FreeChunk*):
         0x00007f9405b46e80 <+0>: push %rbp
         0x00007f9405b46e81 <+1>: mov %rsp,%rbp
         0x00007f9405b46e84 <+4>: push %r13
         0x00007f9405b46e86 <+6>: push %r12
         0x00007f9405b46e88 <+8>: push %rbx
         0x00007f9405b46e89 <+9>: mov %rdi,%r12
         0x00007f9405b46e8c <+12>: mov %rsi,%rbx
         0x00007f9405b46e8f <+15>: sub $0x18,%rsp
         0x00007f9405b46e93 <+19>: cmpb $0x0,0x38(%rdi)
         0x00007f9405b46e97 <+23>: jne 0x7f9405b47040 <SweepClosure::do_live_chunk(FreeChunk*)+448>
         0x00007f9405b46e9d <+29>: mov 0x30(%r12),%rdx
         0x00007f9405b46ea2 <+34>: lea 0x8(%rbx),%rax
         0x00007f9405b46ea6 <+38>: mov $0x1,%r9d
         0x00007f9405b46eac <+44>: mov (%rdx),%rsi
         0x00007f9405b46eaf <+47>: mov 0x10(%rdx),%edi
         0x00007f9405b46eb2 <+50>: mov 0xa8(%rdx),%r8
         0x00007f9405b46eb9 <+57>: sub %rsi,%rax
         0x00007f9405b46ebc <+60>: mov %edi,%ecx
         0x00007f9405b46ebe <+62>: shr $0x3,%rax
         0x00007f9405b46ec2 <+66>: shr %cl,%rax
         0x00007f9405b46ec5 <+69>: mov %rax,%r10
         0x00007f9405b46ec8 <+72>: and $0x3f,%eax
         0x00007f9405b46ecb <+75>: shr $0x6,%r10
         0x00007f9405b46ecf <+79>: mov %rax,%rcx
         0x00007f9405b46ed2 <+82>: shl %cl,%r9
         0x00007f9405b46ed5 <+85>: test %r9,(%r8,%r10,8)
         0x00007f9405b46ed9 <+89>: jne 0x7f9405b46f70 <SweepClosure::do_live_chunk(FreeChunk*)+240>
         0x00007f9405b46edf <+95>: lea 0xaad8f3(%rip),%rax # 0x7f94065f47d9 <UseCompressedClassPointers>
         0x00007f9405b46ee6 <+102>: movzbl (%rax),%edx
         0x00007f9405b46ee9 <+105>: test %dl,%dl
         0x00007f9405b46eeb <+107>: jne 0x7f9405b46f38 <SweepClosure::do_live_chunk(FreeChunk*)+184>
         0x00007f9405b46eed <+109>: mov 0x8(%rbx),%rdi
         0x00007f9405b46ef1 <+113>: mov 0x8(%rdi),%eax

      We can see register rdi is 0.

      (gdb) info registers
      rax 0x0 0
      rbx 0x7f7ebd7f0018 140182321823768
      rcx 0x3 3
      rdx 0x0 0
      rsi 0x7f7ebd7f0018 140182321823768
      rdi 0x0 0
      rbp 0x7f60f74dcc30 0x7f60f74dcc30

      From the assembly, we can see rdi is from the memory pointed by 0x8(%rbx) and rbx is from rsi which is the passed-in parameter FreeChunk *fc.

      #15 0x00007f9405b46ef1 in SweepClosure::do_live_chunk(FreeChunk*) (this=0x7f60f74ddcd0, fc=0x7f7ebd7f0018) at /srv/jdk/openjdk-8-8u171-b11/src/hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp:8602

      But all the values on the memory of pointer fc pointing at are 0s.

      gdb) x /16b 0x7f7ebd7f0018
      0x7f7ebd7f0018: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
      0x7f7ebd7f0020: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

      In the source code,
      SweepClosure::do_live_chunk() calls inline method oopDesc::size() and then calls klass() which cause the exception.

      inline Klass* oopDesc::klass() const {
       if (UseCompressedClassPointers) {
         return Klass::decode_klass_not_null(_metadata._compressed_klass);
       } else {
         return _metadata._klass;
       }
      }


      (gdb) p *(struct oopDesc *) 0x7f7ebd7f0018
      $5 = {_mark = 0x0, _metadata = {_klass = 0x0, _compressed_klass = 0}, static _bs = 0x7f940004a6c0}

      _metadata is the second parameter of oopDesc. So _metadata address is 8(x64 word is 8) which is invalid. That is why we see the error message in the beginning with invalid address 0x0000000000000008.

      However, I am not able to find out why smartctrl which is a batch tool that can cause the the issue. Please help.


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      run smartctrl -x

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      not crash
      ACTUAL -
      crash

      FREQUENCY : occasionally


            fmatte Fairoz Matte
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: