-
Enhancement
-
Resolution: Unresolved
-
P4
-
18
-
generic
when doing gcm/lcm, We should not only consider the height of nodes(latency), but also consider whether there is data dependency between nodes. When there is data dependency between two nodes and the delay of the previous node is large, another node without data dependency can be considered inserting between the two nodes. For example:
sequence 1:
ldrd V16, [R15, #16] # double
fmuld V18, V16, V17
faddd V16, V18, V16
strd V16, [R15, #16] # double
ldrw R2, [R13, #16] # int
addw R1, R2, R2
addw R1, R1, #2
strw R1, [R13, #16] # int
sequence 2:
ldrd V16, [R15, #16] # double
ldrw R2, [R13, #16] # int
fmuld V18, V16, V17
addw R1, R2, R2
faddd V16, V18, V16
strd V16, [R15, #16] # double
addw R1, R1, #2
strw R1, [R13, #16] # int
Sequence 2 is more efficient than sequence 1 for aarch64 and MIPS architectures.
sequence 1:
ldrd V16, [R15, #16] # double
fmuld V18, V16, V17
faddd V16, V18, V16
strd V16, [R15, #16] # double
ldrw R2, [R13, #16] # int
addw R1, R2, R2
addw R1, R1, #2
strw R1, [R13, #16] # int
sequence 2:
ldrd V16, [R15, #16] # double
ldrw R2, [R13, #16] # int
fmuld V18, V16, V17
addw R1, R2, R2
faddd V16, V18, V16
strd V16, [R15, #16] # double
addw R1, R1, #2
strw R1, [R13, #16] # int
Sequence 2 is more efficient than sequence 1 for aarch64 and MIPS architectures.
- links to
-
Review openjdk/jdk/6407