-
Bug
-
Resolution: Unresolved
-
P4
-
8u121, 9, 10
-
x86_64
There is a number of cases (observed in real applications) when profiling with AsyncGetCallTrace is useless due to HotSpot inability to walk through Java stack. Here is the analysis of such cases.
#1. An application performs many System.arraycopy() and spends a lot of time inside generated arraycopy stubs.
public class Prof1 {
public static void main(String[] args) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 1000000; i++) {
sb.append("ab");
sb.delete(0, 1);
}
System.out.println(sb.length());
}
}
--- Collected profile ---
Total samples: 500
ticks_unknown_Java: 499 (99,80%)
Samples: 1 (0,20%)
[ 0] java.lang.AbstractStringBuilder.ensureCapacityInternal
[ 1] java.lang.AbstractStringBuilder.append
[ 2] java.lang.StringBuilder.append
[ 3] Prof1.main
Here AsyncGetCallTrace fails with error = -5 (ticks_unknown_Java) almost in 100% cases. The reason is that frame::safe_for_sender() does not check is_stub_frame() and therefore returns negative result after "if (_cb->frame_size() <= 0)" check. Such stacks could be perfectly traversed if stub frames were treated specially.
--- Expected profile ---
Total samples: 499
Samples: 499 (100,00%)
[ 0] jshort_disjoint_arraycopy
[ 1] java.lang.AbstractStringBuilder.delete
[ 2] java.lang.StringBuilder.delete
[ 3] Prof1.main
#2. Another case is a method with a megamorphic interface call.
public class Prof2 {
public static void main(String[] args) {
Supplier[] suppliers = {
() -> 0,
() -> 1.0,
() -> "abc",
() -> true
};
for (int i = 0; i >= 0; i++) {
suppliers[i % suppliers.length].get();
}
}
}
--- Collected profile ---
Total samples: 494
ticks_unknown_Java: 331 (67,00%)
Samples: 128 (25,91%)
[ 0] Prof2.main
Samples: 26 (5,26%)
[ 0] java.lang.Double.valueOf
[ 1] Prof2.lambda$main$1
[ 2] Prof2$$Lambda$2.531885035.get
[ 3] Prof2.main
Samples: 4 (0,81%)
[ 0] Prof2$$Lambda$4.303563356.get
[ 1] Prof2.main
Samples: 3 (0,61%)
[ 0] Prof2$$Lambda$3.1418481495.get
[ 1] Prof2.main
In this example AsyncGetCallTrace fails to recognize 2/3 samples. Many of these samples occur during invocation of Supplier.get(). Since the call site is megamorphic, the method is invoked via itable stub. Just like in the first case, frame::safe_for_sender() fails here on "if (_cb->frame_size() <= 0)" check. VtableStub creates a BufferBlob with zero frame_size, so it is not walkable by AsyncGetCallTrace. However, if we explicitly set the correct frame size of VtableStub, it will become walkable. For instance, on x64 frame_size == 1, since only the return address is saved on the stack.
#3. Even after VtableStub fix, the profilers based on AsyncGetCallTrace will still fail on the above code in 30% cases.
Note that Suppliers are very simple, their compiled method bodies are short. However, each compiled method has the prologue and the epilogue like this:
mov %eax,-0x14000(%rsp)
push %rbp
sub $0x30,%rsp
[method body]
add $0x30,%rsp
pop %rbp
test %eax,0x1234567(%rip)
retq
For the short methods sampling profilers often hit inside the prologue or the epilogue, while the stack frame is not fully constructed. Such inconsistent frames can be divided into two types:
1) They have only a return address on the stack; that is, sender_pc = *(sp)
2) They have a return address and fp on the stack; that is, sender_pc = *(sp+1); sender_fp = *(sp)
Stack-walking code could recognize these two patterns and easily find the previous frame.
--- Expected profile ---
Total: 494
Samples: 227 (45,95%)
[ 0] itable stub
[ 1] Prof2.main
Samples: 130 (26,32%)
[ 0] Prof2.main
Samples: 32 (6,48%)
[ 0] Prof2$$Lambda$1.791452441.get
[ 1] Prof2.main
Samples: 31 (6,28%)
[ 0] Prof2$$Lambda$3.1418481495.get
[ 1] Prof2.main
Samples: 26 (5,26%)
[ 0] java.lang.Double.valueOf
[ 1] Prof2.lambda$main$1
[ 2] Prof2$$Lambda$2.531885035.get
[ 3] Prof2.main
Samples: 25 (5,06%)
[ 0] Prof2$$Lambda$4.303563356.get
[ 1] Prof2.main
Samples: 13 (2,63%)
[ 0] Prof2$$Lambda$2.531885035.get
[ 1] Prof2.main
Samples: 6 (1,21%)
[ 0] java.lang.Integer.valueOf
[ 1] Prof2.lambda$main$0
[ 2] Prof2$$Lambda$1.791452441.get
[ 3] Prof2.main
As a proof-of-concept I've implemented a workaround for the above cases in 'async-profiler' project, and the percentage of ticks_unknown_Java has become negligible. However, this is not a problem of the particular profiler, but rather a general bug in AsyncGetCallTrace mechanism. For example, a well-known 'honest-profiler' project also suffers from the same bug. So, the right way to solve the problem would be to fix AsyncGetCallTrace.
#1. An application performs many System.arraycopy() and spends a lot of time inside generated arraycopy stubs.
public class Prof1 {
public static void main(String[] args) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 1000000; i++) {
sb.append("ab");
sb.delete(0, 1);
}
System.out.println(sb.length());
}
}
--- Collected profile ---
Total samples: 500
ticks_unknown_Java: 499 (99,80%)
Samples: 1 (0,20%)
[ 0] java.lang.AbstractStringBuilder.ensureCapacityInternal
[ 1] java.lang.AbstractStringBuilder.append
[ 2] java.lang.StringBuilder.append
[ 3] Prof1.main
Here AsyncGetCallTrace fails with error = -5 (ticks_unknown_Java) almost in 100% cases. The reason is that frame::safe_for_sender() does not check is_stub_frame() and therefore returns negative result after "if (_cb->frame_size() <= 0)" check. Such stacks could be perfectly traversed if stub frames were treated specially.
--- Expected profile ---
Total samples: 499
Samples: 499 (100,00%)
[ 0] jshort_disjoint_arraycopy
[ 1] java.lang.AbstractStringBuilder.delete
[ 2] java.lang.StringBuilder.delete
[ 3] Prof1.main
#2. Another case is a method with a megamorphic interface call.
public class Prof2 {
public static void main(String[] args) {
Supplier[] suppliers = {
() -> 0,
() -> 1.0,
() -> "abc",
() -> true
};
for (int i = 0; i >= 0; i++) {
suppliers[i % suppliers.length].get();
}
}
}
--- Collected profile ---
Total samples: 494
ticks_unknown_Java: 331 (67,00%)
Samples: 128 (25,91%)
[ 0] Prof2.main
Samples: 26 (5,26%)
[ 0] java.lang.Double.valueOf
[ 1] Prof2.lambda$main$1
[ 2] Prof2$$Lambda$2.531885035.get
[ 3] Prof2.main
Samples: 4 (0,81%)
[ 0] Prof2$$Lambda$4.303563356.get
[ 1] Prof2.main
Samples: 3 (0,61%)
[ 0] Prof2$$Lambda$3.1418481495.get
[ 1] Prof2.main
In this example AsyncGetCallTrace fails to recognize 2/3 samples. Many of these samples occur during invocation of Supplier.get(). Since the call site is megamorphic, the method is invoked via itable stub. Just like in the first case, frame::safe_for_sender() fails here on "if (_cb->frame_size() <= 0)" check. VtableStub creates a BufferBlob with zero frame_size, so it is not walkable by AsyncGetCallTrace. However, if we explicitly set the correct frame size of VtableStub, it will become walkable. For instance, on x64 frame_size == 1, since only the return address is saved on the stack.
#3. Even after VtableStub fix, the profilers based on AsyncGetCallTrace will still fail on the above code in 30% cases.
Note that Suppliers are very simple, their compiled method bodies are short. However, each compiled method has the prologue and the epilogue like this:
mov %eax,-0x14000(%rsp)
push %rbp
sub $0x30,%rsp
[method body]
add $0x30,%rsp
pop %rbp
test %eax,0x1234567(%rip)
retq
For the short methods sampling profilers often hit inside the prologue or the epilogue, while the stack frame is not fully constructed. Such inconsistent frames can be divided into two types:
1) They have only a return address on the stack; that is, sender_pc = *(sp)
2) They have a return address and fp on the stack; that is, sender_pc = *(sp+1); sender_fp = *(sp)
Stack-walking code could recognize these two patterns and easily find the previous frame.
--- Expected profile ---
Total: 494
Samples: 227 (45,95%)
[ 0] itable stub
[ 1] Prof2.main
Samples: 130 (26,32%)
[ 0] Prof2.main
Samples: 32 (6,48%)
[ 0] Prof2$$Lambda$1.791452441.get
[ 1] Prof2.main
Samples: 31 (6,28%)
[ 0] Prof2$$Lambda$3.1418481495.get
[ 1] Prof2.main
Samples: 26 (5,26%)
[ 0] java.lang.Double.valueOf
[ 1] Prof2.lambda$main$1
[ 2] Prof2$$Lambda$2.531885035.get
[ 3] Prof2.main
Samples: 25 (5,06%)
[ 0] Prof2$$Lambda$4.303563356.get
[ 1] Prof2.main
Samples: 13 (2,63%)
[ 0] Prof2$$Lambda$2.531885035.get
[ 1] Prof2.main
Samples: 6 (1,21%)
[ 0] java.lang.Integer.valueOf
[ 1] Prof2.lambda$main$0
[ 2] Prof2$$Lambda$1.791452441.get
[ 3] Prof2.main
As a proof-of-concept I've implemented a workaround for the above cases in 'async-profiler' project, and the percentage of ticks_unknown_Java has become negligible. However, this is not a problem of the particular profiler, but rather a general bug in AsyncGetCallTrace mechanism. For example, a well-known 'honest-profiler' project also suffers from the same bug. So, the right way to solve the problem would be to fix AsyncGetCallTrace.
- relates to
-
JDK-8153167 Allow stack walking pass through method handle intrinsic frames
-
- Open
-
-
JDK-8252417 JFR Execution Sample event fails to traverse valid Java stacks
-
- Open
-
- links to
-
Review openjdk/jdk/4436