-
Enhancement
-
Resolution: Fixed
-
P4
-
17, 20, 21
-
b23
-
aarch64
The following two patterns in MD5 intrinsic is inefficient.
1) Loading the updated hash values which were just written into the memory.
md5_loop:
__ ldrw(a, Address(state, 0)); // b) read the updated hash values which were just written
... // loop body
__ ldrw(rscratch1, Address(state, 0));
__ addw(rscratch1, rscratch1, a);
__ strw(rscratch1, Address(state, 0)); // a) write the updated hash values into the memory
...
__ br(Assembler::LE, md5_loop);
2) Redundant loads generated by md5_FF, md5_GG, md5_HH, and md5_II.
Loads generated by two md5_FFs and md5_GGs:
__ ldrw(rscratch1, Address(buf, 0)); // from md5_FF(.., k = 0, ..)
...
__ ldrw(rscratch1, Address(buf, 4)); // from md5_FF(.., k = 1, ..)
...
__ ldrw(rscratch1, Address(buf, 4)); // from md5_GG(.., k = 1, ..)
...
__ ldrw(rscratch1, Address(buf, 0)); // from md5_GG(.., k = 0, ..)
1) Loading the updated hash values which were just written into the memory.
md5_loop:
__ ldrw(a, Address(state, 0)); // b) read the updated hash values which were just written
... // loop body
__ ldrw(rscratch1, Address(state, 0));
__ addw(rscratch1, rscratch1, a);
__ strw(rscratch1, Address(state, 0)); // a) write the updated hash values into the memory
...
__ br(Assembler::LE, md5_loop);
2) Redundant loads generated by md5_FF, md5_GG, md5_HH, and md5_II.
Loads generated by two md5_FFs and md5_GGs:
__ ldrw(rscratch1, Address(buf, 0)); // from md5_FF(.., k = 0, ..)
...
__ ldrw(rscratch1, Address(buf, 4)); // from md5_FF(.., k = 1, ..)
...
__ ldrw(rscratch1, Address(buf, 4)); // from md5_GG(.., k = 1, ..)
...
__ ldrw(rscratch1, Address(buf, 0)); // from md5_GG(.., k = 0, ..)