-
Enhancement
-
Resolution: Fixed
-
P4
-
port-stage-aarch32-8
-
None
-
aarch32
-
linux
Now every constant that may be changed concurrently placed in const
section, and patchable method looks like
b stub
dmb
add rD, 0xff000
ldr rD, [rD, 0xfff]
First `b stub` replaced with nop after patching applied.
Dmb acts as LoadLoad barrier to order constant load with instruction
modification `b stub` -> `nop`
Maximum offset is 0xffffff (1Mb) which is enough to reach whole nmethod,
which is 1.1*4*64K (<300Kb) for C1.
Generated code looks more straightforward, and
ARMv6 could benefit from smaller 32bit constant loading, add+ldr instead
of mov+3*orr (not implemented yet).
In terms of performance, spec2008 shows minor improvement in average
0.01 ops/m (~1-2%)
Also, number of relocations grows, now for every load there should be
another relocation to fix offset when const section moves. But oop and
metadata relocations are not holding respecting values now, so it's not
a big loose.
Changing relocations implies changes in c1 shared code that deals with them.
section, and patchable method looks like
b stub
dmb
add rD, 0xff000
ldr rD, [rD, 0xfff]
First `b stub` replaced with nop after patching applied.
Dmb acts as LoadLoad barrier to order constant load with instruction
modification `b stub` -> `nop`
Maximum offset is 0xffffff (1Mb) which is enough to reach whole nmethod,
which is 1.1*4*64K (<300Kb) for C1.
Generated code looks more straightforward, and
ARMv6 could benefit from smaller 32bit constant loading, add+ldr instead
of mov+3*orr (not implemented yet).
In terms of performance, spec2008 shows minor improvement in average
0.01 ops/m (~1-2%)
Also, number of relocations grows, now for every load there should be
another relocation to fix offset when const section moves. But oop and
metadata relocations are not holding respecting values now, so it's not
a big loose.
Changing relocations implies changes in c1 shared code that deals with them.