Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8240772

x86_64: Pre-generate Assembler::popa, pusha and vzeroupper

XMLWordPrintable

    • b15
    • x86_64

      On 64-bit x86, emitting Assembler::pusha and ::popa cost roughly 1.2M instructions during startup. These assembler routines only emit non-relocating instructions and are invariant with any environment settings (UseAVX etc..), so the instruction stream could be pre-calculated and streamlined considerably.

      PoC experiment: http://cr.openjdk.java.net/~redestad/scratch/pusha_popa.00/

      Before:
             111,118,300 instructions # 0.80 insns per cycle ( +- 0.07% )
              21,941,371 branches # 414.614 M/sec ( +- 0.07% )
                 766,596 branch-misses # 3.49% of all branches ( +- 0.15% )

      After:
             110,039,451 instructions # 0.81 insns per cycle ( +- 0.07% )
              21,792,132 branches # 419.041 M/sec ( +- 0.07% )
                 761,485 branch-misses # 3.49% of all branches ( +- 0.14% )

            redestad Claes Redestad
            redestad Claes Redestad
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: