For a char/short array access, with loop unrolled, AArch64 OpenJDK currently generates:
0x0000ffff6fb247a0: ldrsh w10, [x2,w11,sxtw #1]
0x0000ffff6fb247a4: ldrsh w13, [x17,w11,sxtw #1]
0x0000ffff6fb247a8: add w14, w0, w10
0x0000ffff6fb247ac: ldrsh w12, [x15,w11,sxtw #1]
0x0000ffff6fb247b0: add w14, w14, w13
0x0000ffff6fb247b4: ldrsh w10, [x16,w11,sxtw #1]
According to [1], we prefer not using the scale by 2 ldrh.
[1] http://infocenter.arm.com/help/topic/com.arm.doc.uan0016a/cortex_a72_software_optimization_guide_external.pdf
0x0000ffff6fb247a0: ldrsh w10, [x2,w11,sxtw #1]
0x0000ffff6fb247a4: ldrsh w13, [x17,w11,sxtw #1]
0x0000ffff6fb247a8: add w14, w0, w10
0x0000ffff6fb247ac: ldrsh w12, [x15,w11,sxtw #1]
0x0000ffff6fb247b0: add w14, w14, w13
0x0000ffff6fb247b4: ldrsh w10, [x16,w11,sxtw #1]
According to [1], we prefer not using the scale by 2 ldrh.
[1] http://infocenter.arm.com/help/topic/com.arm.doc.uan0016a/cortex_a72_software_optimization_guide_external.pdf