Details
-
Enhancement
-
Status: Resolved
-
P4
-
Resolution: Fixed
-
17, 18, 19
-
b05
-
aarch64
-
generic
Backports
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8282793 | 17.0.4-oracle | Tobias Hartmann | P4 | Resolved | Fixed | b01 |
JDK-8282654 | 17.0.4 | Dmitry Chuyko | P4 | Resolved | Fixed | b01 |
Description
After JDK-8268231, there is a following code shape:
if (SoftwarePrefetchHintDistance >= 0) {
__ bind(LARGE_LOOP_PREFETCH);
__ prfm(Address(str1, SoftwarePrefetchHintDistance));
__ prfm(Address(str2, SoftwarePrefetchHintDistance));
__ align(OptoLoopAlignment);
for (int i = 0; i < 4; i++) {
__ ldp(tmp1, tmp1h, Address(str1, i * 16));
__ ldp(tmp2, tmp2h, Address(str2, i * 16));
__ cmp(tmp1, tmp2);
__ ccmp(tmp1h, tmp2h, 0, Assembler::EQ);
__ br(Assembler::NE, DIFF);
}
__ sub(cnt2, cnt2, isLL ? 64 : 32);
__ add(str1, str1, 64);
__ add(str2, str2, 64);
__ subs(rscratch2, cnt2, largeLoopExitCondition);
__ br(Assembler::GE, LARGE_LOOP_PREFETCH);
__ cbz(cnt2, LENGTH_DIFF); // no more chars left?
}
I believe the intent is to make sure that LARGE_LOOP_PREFETCH jump target is aligned. In other words, should probably be like this:
__ align(OptoLoopAlignment);
__ bind(LARGE_LOOP_PREFETCH);
__ prfm(Address(str1, SoftwarePrefetchHintDistance));
__ prfm(Address(str2, SoftwarePrefetchHintDistance));
This is the form that every other use of align(OptoLoopAlignment) takes. Current shape probably has some minor performance penalties.
Tentatively assigning to Wang Huang who did the originalJDK-8268231.
if (SoftwarePrefetchHintDistance >= 0) {
__ bind(LARGE_LOOP_PREFETCH);
__ prfm(Address(str1, SoftwarePrefetchHintDistance));
__ prfm(Address(str2, SoftwarePrefetchHintDistance));
__ align(OptoLoopAlignment);
for (int i = 0; i < 4; i++) {
__ ldp(tmp1, tmp1h, Address(str1, i * 16));
__ ldp(tmp2, tmp2h, Address(str2, i * 16));
__ cmp(tmp1, tmp2);
__ ccmp(tmp1h, tmp2h, 0, Assembler::EQ);
__ br(Assembler::NE, DIFF);
}
__ sub(cnt2, cnt2, isLL ? 64 : 32);
__ add(str1, str1, 64);
__ add(str2, str2, 64);
__ subs(rscratch2, cnt2, largeLoopExitCondition);
__ br(Assembler::GE, LARGE_LOOP_PREFETCH);
__ cbz(cnt2, LENGTH_DIFF); // no more chars left?
}
I believe the intent is to make sure that LARGE_LOOP_PREFETCH jump target is aligned. In other words, should probably be like this:
__ align(OptoLoopAlignment);
__ bind(LARGE_LOOP_PREFETCH);
__ prfm(Address(str1, SoftwarePrefetchHintDistance));
__ prfm(Address(str2, SoftwarePrefetchHintDistance));
This is the form that every other use of align(OptoLoopAlignment) takes. Current shape probably has some minor performance penalties.
Tentatively assigning to Wang Huang who did the original
Attachments
Issue Links
- backported by
-
JDK-8282654 AArch64: generate_compare_long_string_same_encoding and LARGE_LOOP_PREFETCH alignment
-
- Resolved
-
-
JDK-8282793 AArch64: generate_compare_long_string_same_encoding and LARGE_LOOP_PREFETCH alignment
-
- Resolved
-
- relates to
-
JDK-8268231 Aarch64: Use Ldp in intrinsics for String.compareTo
-
- Resolved
-
- links to
-
Commit openjdk/jdk17u-dev/a51a5f03
-
Commit openjdk/jdk/126328cb
-
Review openjdk/jdk17u-dev/191
-
Review openjdk/jdk/7007
(2 links to)