Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8144498

aarch64: large code cache generates SEGV

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 9
    • None
    • hotspot
    • None
    • b103
    • aarch64
    • linux

      Running jtreg/langtools with -XX:ReservedCodeCacheSize=512m generates a number of failures dues to SEGVs whereas running without this option passes all tests.

      The set of tests which fails each time is different. For example on two back to back runs I get

      FAILED: tools/javac/classfiles/attributes/annotations/RuntimeAnnotationsForInnerAnnotationTest.java
      FAILED: tools/javac/T6410706.java
      FAILED: tools/jdeps/DotFileTest.java
      ed@arm64:~/jtreg/jtreg$ fgrep FAILED log_512m_2
      FAILED: com/sun/javadoc/testSimpleTag/TestSimpleTag.java
      FAILED: com/sun/javadoc/testWindowTitle/TestWindowTitle.java
      FAILED: jdk/jshell/CompletionSuggestionTest.java

      The command used to invoke jtreg was

      /home/ed/images/jdk9-orig/bin/java -jar lib/jtreg.jar -vmoption:-XX:ReservedCodeCacheSize=512m -nr -conc:48 -timeout:99 -othervm -jdk:/home/ed/images/jdk9-orig -v1 -a -ignore:quiet /home/ed/new_jdk9/hs-comp/langtools/test

      The problem can also be replicated with EEMBC GrinderBench although it may required many 100s of runs to trigger. The command I used to invoke GrinderBench is

      /home/ed/images/jdk9-orig/bin/java -XX:ReservedCodeCacheSize=512m -classpath dist/fullset/bench1.jar org.eembc.grinderbench.CmdlineWrapper -r 1 -m 1 -t 4

      For the purposes of the following I have chosen to investigate the GrinderBench failure because it is easier to debug than random failures in jtreg/

      The SEGV occurs in a method which is called from SharedRuntime::resolve_opt_virtual_call_C. The call backtrace is about 20 frames long. The following are the oldest few frames.

      ....
      #17 0x000003ff99717a44 in SharedRuntime::resolve_helper (thread=thread@entry=0x3ff94010000,
          is_virtual=is_virtual@entry=true, is_optimized=is_optimized@entry=true,
          __the_thread__=__the_thread__@entry=0x3ff94010000)
          at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/sharedRuntime.cpp:1186
      #18 0x000003ff99718988 in SharedRuntime::resolve_opt_virtual_call_C (thread=0x3ff94010000)
          at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/sharedRuntime.cpp:1441
      #19 0x000003ff70ab23a8 in ?? ()
      #20 0x000003fdd59228f0 in ?? ()

      Looking at frame #19
      (gdb) x/10i $pc-20
         0x3ff70ab2394: mov x0, x28
         0x3ff70ab2398: mov x8, #0x8950 // #35152
         0x3ff70ab239c: movk x8, #0x9971, lsl #16
         0x3ff70ab23a0: movk x8, #0x3ff, lsl #32
         0x3ff70ab23a4: blr x8
      => 0x3ff70ab23a8: isb
         0x3ff70ab23ac: str xzr, [x28,#440]
         0x3ff70ab23b0: str xzr, [x28,#448]
         0x3ff70ab23b4: ldr x8, [x28,#8]
         0x3ff70ab23b8: cbnz x8, 0x3ff70ab2454

      This is a stub for resolve_opt_virtual_call. So here it calls 0x3ff99718950 and disassembling that

      (gdb) x/i 0x3ff99718950
         0x3ff99718950 <SharedRuntime::resolve_opt_virtual_call_C(JavaThread*)>:
          stp x29, x30, [sp,#-80]!

      So it is calling SharedRuntime::resolve_opt_virtual_call_C which is correct according to the above stack trace. However, looking at the previous frame

      (gdb) x/2g $fp
      0x3ff98dede60: 0x0000000000000138 0x000003ff7122469c
      (gdb) x/12i 0x000003ff7122469c-40
         0x3ff71224674: ret
         0x3ff71224678: mov x8, #0x28f0 // #10480
         0x3ff7122467c: movk x8, #0xd592, lsl #16
         0x3ff71224680: movk x8, #0x3fd, lsl #32
         0x3ff71224684: str x8, [sp,#8]
         0x3ff71224688: mov x8, #0xffffffffffffffff // #-1
         0x3ff7122468c: str x8, [sp]
         0x3ff71224690: adrp x8, 0x3ff70ab2000 <<< HERE
         0x3ff71224694: add x8, x8, #0x300 <<<
         0x3ff71224698: blr x8 <<<
         0x3ff7122469c: b 0x3ff712242f8 <<<
         0x3ff712246a0: adrp x8, 0x3ff70adf000

      The code marked HERE is a out of line stub which is calling the resolve_opt_virtual_call stub. So far so good.

      *** But this is not the correct code to call resolve_opt_virtual_call ****

      This is in fact the code generated by the following from c1_CodeStubs_aarch64.cpp

      void CounterOverflowStub::emit_code(LIR_Assembler* ce) {
        __ bind(_entry);
        Metadata *m = _method->as_constant_ptr()->as_metadata();
        __ mov_metadata(rscratch1, m);
        ce->store_parameter(rscratch1, 1);
        ce->store_parameter(_bci, 0);
        __ far_call(RuntimeAddress(Runtime1::entry_for(Runtime1::counter_overflow_id)));
        ce->add_call_info_here(_info);
        ce->verify_oop_map(_info);
        __ b(_continuation);
      }

      So this code is supposed to be calling Runtime1::counter_overflow. The -1 for the BCI is the InvocationEntryBci because this is an invocation entry counter overflow and it is this -1 which eventually causes the SEGV because it is being used as a genuine index into the bytecode to get a constant pool index for the invoke.

      But is shouldn't be calling SharedRuntime::resolve_opt_virtual_call_C, it should be calling Runtime1::counter_overflow.

      Tracing back where this out of line stub is called from

      (gdb) x/10i 0x3ff712242f8-36
         0x3ff712242d4: mov x0, #0xc250 // #49744
         0x3ff712242d8: movk x0, #0xd592, lsl #16
         0x3ff712242dc: movk x0, #0x3fd, lsl #32
         0x3ff712242e0: ldr w6, [x0,#220]
         0x3ff712242e4: add w6, w6, #0x8
         0x3ff712242e8: str w6, [x0,#220]
         0x3ff712242ec: and w6, w6, #0x1ff8
         0x3ff712242f0: cmp w6, #0x0
         0x3ff712242f4: b.eq 0x3ff71224678 <<<< HERE is the b to the out of line stub
         0x3ff712242f8: str w5, [sp,#52]
      (gdb)

      So the above confirms that it is really doing a counter overflow but calling resolve_opt_virtual_call.

      So I tried changing the 'far_call' method in macroAssembler_aarch64.cpp to use movz/movk/movk instead of adrp/add.

      IE
          // We can use ADRP here because we know that the total size of
          // the code cache cannot exceed 2Gb.
          adrp(tmp, entry, offset);
          add(tmp, tmp, offset);

      becomes

          // We can use ADRP here because we know that the total size of
          // the code cache cannot exceed 2Gb.
          movptr(tmp, (uintptr_t)entry.target());
          //adrp(tmp, entry, offset);
          //add(tmp, tmp, offset);

      This cause GrinderBench to start working (at least, no failures after about 5000 runs).

      So I changed this to read

          // We can use ADRP here because we know that the total size of
          // the code cache cannot exceed 2Gb.
          movptr(tmp, (uintptr_t)entry.target());
          adrp(tmp, entry, offset);
          add(tmp, tmp, offset);

      IE. So it generate both the movz/movk/movk vsn and the adrp/add version but uses the adrp version discarding the result of the movz/movk/movk version.

      Now when I list the out of line stub in gdb I get

      (gdb) x/10i 0x000003ff5521d5dc-32
         0x3ff5521d5bc: mov x8, #0xffffffffffffffff // #-1
         0x3ff5521d5c0: str x8, [sp]
         0x3ff5521d5c4: mov x8, #0x9780 <<< movz/movk/movk -> 0x3ff54c89780
         0x3ff5521d5c8: movk x8, #0x54c8, lsl #16
         0x3ff5521d5cc: movk x8, #0x3ff, lsl #32
         0x3ff5521d5d0: adrp x8, 0x3ff54ab2000 <<< adrp/add -> 0x3ff54ab2300
         0x3ff5521d5d4: add x8, x8, #0x300
         0x3ff5521d5d8: blr x8
         0x3ff5521d5dc: b 0x3ff5521d308
         0x3ff5521d5e0: mov x8, #0xfc80 // #64640

      So the adrp/add and movz/movk/movk address different runtime stubs. Disassembling both shows that the adrp is addressing the resolve_opt_virtual_call stub and the movz/movk/movk is addressing the Runtime1::counter_overflow stub.

      So it looks like the adrp is either not being relocated, or is being relocated incorrectly.

            enevill Ed Nevill
            enevill Ed Nevill
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Estimated:
                Original Estimate - 2 weeks
                2w
                Remaining:
                Remaining Estimate - 2 weeks
                2w
                Logged:
                Time Spent - Not Specified
                Not Specified