-
Enhancement
-
Resolution: Fixed
-
P4
-
15
-
b15
-
x86_64
On 64-bit x86, emitting Assembler::pusha and ::popa cost roughly 1.2M instructions during startup. These assembler routines only emit non-relocating instructions and are invariant with any environment settings (UseAVX etc..), so the instruction stream could be pre-calculated and streamlined considerably.
PoC experiment: http://cr.openjdk.java.net/~redestad/scratch/pusha_popa.00/
Before:
111,118,300 instructions # 0.80 insns per cycle ( +- 0.07% )
21,941,371 branches # 414.614 M/sec ( +- 0.07% )
766,596 branch-misses # 3.49% of all branches ( +- 0.15% )
After:
110,039,451 instructions # 0.81 insns per cycle ( +- 0.07% )
21,792,132 branches # 419.041 M/sec ( +- 0.07% )
761,485 branch-misses # 3.49% of all branches ( +- 0.14% )
PoC experiment: http://cr.openjdk.java.net/~redestad/scratch/pusha_popa.00/
Before:
111,118,300 instructions # 0.80 insns per cycle ( +- 0.07% )
21,941,371 branches # 414.614 M/sec ( +- 0.07% )
766,596 branch-misses # 3.49% of all branches ( +- 0.15% )
After:
110,039,451 instructions # 0.81 insns per cycle ( +- 0.07% )
21,792,132 branches # 419.041 M/sec ( +- 0.07% )
761,485 branch-misses # 3.49% of all branches ( +- 0.14% )