In ZGC loadbarrier slowpath, we currently save the whole live NEON/SVE register ignoring the real size (e.g. floating point only) in ZSaveLiveRegisters. For different types of live registers (NEON/FP/SVE), we should push/pop different sizes of fp registers instead of the whole vector regs to get better performance, just as what x86 does.