This fix minimizes the AVX to SSE and SSE to AVX transition penalty through generation of vzeroupper instruction. With this patch we see zero transitions with penalty per SPECjbb2015 jOPS on BDW and a significant reduction on SKX CPU event vector width mismatch from 65 to 0.01 per SPECjbb2015 jOPS. We have also implemented an enhancement to disable vzeroupper generation for Knights family where the instruction has high penalty and is not recommended. The option UseVzeroupper is used to control generation of vzeroupper instruction and gets set to false on the Knights family.
We observed ~3% gain on SPECJvm2008 composite result on Skylake.
We observed ~3% gain on SPECJvm2008 composite result on Skylake.
- relates to
-
JDK-8190934 Regressions on Haswell Xeon due to JDK-8178811
-
- Resolved
-
-
JDK-8279676 Dubious YMM register clearing in x86_64 arraycopy stubs
-
- Resolved
-