We're not generating enough profile data in the template interpreter for C2 to use. When we run with -Xbatch -XX:-TieredCompilation we get such low profile counts that they are of no use at all.
final BitTests::testLongMaskBranch(JJ)J
interpreter_invocation_count: 3301
invocation_counter: 3301
backedge_counter: 1
mdo size: 496 bytes
0 fast_aaccess_0
1 fast_agetfield 4 <BitTests.r/LXorShift;>
4 invokevirtual 11 <XorShift.nextLong()J>
0 bci: 4 VirtualCallData count(0) entries(1)
'XorShift'(2 1.00)
7 lload_3
8 land
9 lconst_0
10 lcmp
11 ifeq 18
48 bci: 11 BranchData taken(0) displacement(32)
not taken(2)
This should be something more like:
final BitTests::testLongMaskBranch(JJ)J
interpreter_invocation_count: 10000
invocation_counter: 5000
backedge_counter: 1
mdo size: 496 bytes
0 fast_aaccess_0
1 fast_agetfield 4 <BitTests.r/LXorShift;>
4 invokevirtual 11 <XorShift.nextLong()J>
0 bci: 4 VirtualCallData count(0) entries(1)
'XorShift'(6701 1.00)
7 lload_3
8 land
9 lconst_0
10 lcmp
11 ifeq 18
48 bci: 11 BranchData taken(3343) displacement(32)
not taken(3358)
... which is what we see on x86. Note that the counter overflow is happening early: it should happen on 10000 invocations, but it is happening on only 3301.
final BitTests::testLongMaskBranch(JJ)J
interpreter_invocation_count: 3301
invocation_counter: 3301
backedge_counter: 1
mdo size: 496 bytes
0 fast_aaccess_0
1 fast_agetfield 4 <BitTests.r/LXorShift;>
4 invokevirtual 11 <XorShift.nextLong()J>
0 bci: 4 VirtualCallData count(0) entries(1)
'XorShift'(2 1.00)
7 lload_3
8 land
9 lconst_0
10 lcmp
11 ifeq 18
48 bci: 11 BranchData taken(0) displacement(32)
not taken(2)
This should be something more like:
final BitTests::testLongMaskBranch(JJ)J
interpreter_invocation_count: 10000
invocation_counter: 5000
backedge_counter: 1
mdo size: 496 bytes
0 fast_aaccess_0
1 fast_agetfield 4 <BitTests.r/LXorShift;>
4 invokevirtual 11 <XorShift.nextLong()J>
0 bci: 4 VirtualCallData count(0) entries(1)
'XorShift'(6701 1.00)
7 lload_3
8 land
9 lconst_0
10 lcmp
11 ifeq 18
48 bci: 11 BranchData taken(3343) displacement(32)
not taken(3358)
... which is what we see on x86. Note that the counter overflow is happening early: it should happen on 10000 invocations, but it is happening on only 3301.