Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8178287

AsyncGetCallTrace fails to traverse valid Java stacks

    XMLWordPrintable

Details

    • svc
    • x86_64

    Description

      There is a number of cases (observed in real applications) when profiling with AsyncGetCallTrace is useless due to HotSpot inability to walk through Java stack. Here is the analysis of such cases.

      #1. An application performs many System.arraycopy() and spends a lot of time inside generated arraycopy stubs.

      public class Prof1 {

          public static void main(String[] args) {
              StringBuilder sb = new StringBuilder();
              for (int i = 0; i < 1000000; i++) {
                  sb.append("ab");
                  sb.delete(0, 1);
              }
              System.out.println(sb.length());
          }
      }

      --- Collected profile ---
      Total samples: 500
      ticks_unknown_Java: 499 (99,80%)

      Samples: 1 (0,20%)
        [ 0] java.lang.AbstractStringBuilder.ensureCapacityInternal
        [ 1] java.lang.AbstractStringBuilder.append
        [ 2] java.lang.StringBuilder.append
        [ 3] Prof1.main

      Here AsyncGetCallTrace fails with error = -5 (ticks_unknown_Java) almost in 100% cases. The reason is that frame::safe_for_sender() does not check is_stub_frame() and therefore returns negative result after "if (_cb->frame_size() <= 0)" check. Such stacks could be perfectly traversed if stub frames were treated specially.

      --- Expected profile ---
      Total samples: 499

      Samples: 499 (100,00%)
        [ 0] jshort_disjoint_arraycopy
        [ 1] java.lang.AbstractStringBuilder.delete
        [ 2] java.lang.StringBuilder.delete
        [ 3] Prof1.main


      #2. Another case is a method with a megamorphic interface call.

      public class Prof2 {

          public static void main(String[] args) {
              Supplier[] suppliers = {
                      () -> 0,
                      () -> 1.0,
                      () -> "abc",
                      () -> true
              };

              for (int i = 0; i >= 0; i++) {
                  suppliers[i % suppliers.length].get();
              }
          }
      }

      --- Collected profile ---
      Total samples: 494
      ticks_unknown_Java: 331 (67,00%)

      Samples: 128 (25,91%)
        [ 0] Prof2.main

      Samples: 26 (5,26%)
        [ 0] java.lang.Double.valueOf
        [ 1] Prof2.lambda$main$1
        [ 2] Prof2$$Lambda$2.531885035.get
        [ 3] Prof2.main

      Samples: 4 (0,81%)
        [ 0] Prof2$$Lambda$4.303563356.get
        [ 1] Prof2.main

      Samples: 3 (0,61%)
        [ 0] Prof2$$Lambda$3.1418481495.get
        [ 1] Prof2.main

      In this example AsyncGetCallTrace fails to recognize 2/3 samples. Many of these samples occur during invocation of Supplier.get(). Since the call site is megamorphic, the method is invoked via itable stub. Just like in the first case, frame::safe_for_sender() fails here on "if (_cb->frame_size() <= 0)" check. VtableStub creates a BufferBlob with zero frame_size, so it is not walkable by AsyncGetCallTrace. However, if we explicitly set the correct frame size of VtableStub, it will become walkable. For instance, on x64 frame_size == 1, since only the return address is saved on the stack.


      #3. Even after VtableStub fix, the profilers based on AsyncGetCallTrace will still fail on the above code in 30% cases.

      Note that Suppliers are very simple, their compiled method bodies are short. However, each compiled method has the prologue and the epilogue like this:

          mov %eax,-0x14000(%rsp)
          push %rbp
          sub $0x30,%rsp
          
          [method body]
          
          add $0x30,%rsp
          pop %rbp
          test %eax,0x1234567(%rip)
          retq

      For the short methods sampling profilers often hit inside the prologue or the epilogue, while the stack frame is not fully constructed. Such inconsistent frames can be divided into two types:
       1) They have only a return address on the stack; that is, sender_pc = *(sp)
       2) They have a return address and fp on the stack; that is, sender_pc = *(sp+1); sender_fp = *(sp)
      Stack-walking code could recognize these two patterns and easily find the previous frame.

      --- Expected profile ---
      Total: 494

      Samples: 227 (45,95%)
        [ 0] itable stub
        [ 1] Prof2.main

      Samples: 130 (26,32%)
        [ 0] Prof2.main

      Samples: 32 (6,48%)
        [ 0] Prof2$$Lambda$1.791452441.get
        [ 1] Prof2.main

      Samples: 31 (6,28%)
        [ 0] Prof2$$Lambda$3.1418481495.get
        [ 1] Prof2.main

      Samples: 26 (5,26%)
        [ 0] java.lang.Double.valueOf
        [ 1] Prof2.lambda$main$1
        [ 2] Prof2$$Lambda$2.531885035.get
        [ 3] Prof2.main

      Samples: 25 (5,06%)
        [ 0] Prof2$$Lambda$4.303563356.get
        [ 1] Prof2.main

      Samples: 13 (2,63%)
        [ 0] Prof2$$Lambda$2.531885035.get
        [ 1] Prof2.main

      Samples: 6 (1,21%)
        [ 0] java.lang.Integer.valueOf
        [ 1] Prof2.lambda$main$0
        [ 2] Prof2$$Lambda$1.791452441.get
        [ 3] Prof2.main


      As a proof-of-concept I've implemented a workaround for the above cases in 'async-profiler' project, and the percentage of ticks_unknown_Java has become negligible. However, this is not a problem of the particular profiler, but rather a general bug in AsyncGetCallTrace mechanism. For example, a well-known 'honest-profiler' project also suffers from the same bug. So, the right way to solve the problem would be to fix AsyncGetCallTrace.

      Attachments

        Issue Links

          Activity

            People

              luhenry Ludovic Henry
              apangin Andrei Pangin
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: