Details
-
Enhancement
-
Status: Resolved
-
P4
-
Resolution: Fixed
-
repo-panama
-
aarch64
Description
Now, the implementation of loadV_partial is :
mov $tmp1, 0
mov $tmp2, vector_length
sve_whilelo $pTmp, $tmp1, $tmp2
sve_ldr $dst, $pTmp, $mem
However, we can encode register zr in instruction `sve_whilelo` instead of getting from the first mov instruction.
The new implementation is :
mov $tmp, vector_length
sve_whilelo $pTmp, zr, $tmp
sve_ldr $dst, $pTmp, $mem
From this changing, we reduce a mov instruction and a temporary variable($tmp2 ).
mov $tmp1, 0
mov $tmp2, vector_length
sve_whilelo $pTmp, $tmp1, $tmp2
sve_ldr $dst, $pTmp, $mem
However, we can encode register zr in instruction `sve_whilelo` instead of getting from the first mov instruction.
The new implementation is :
mov $tmp, vector_length
sve_whilelo $pTmp, zr, $tmp
sve_ldr $dst, $pTmp, $mem
From this changing, we reduce a mov instruction and a temporary variable($tmp2 ).