-
Bug
-
Resolution: Fixed
-
P2
-
hs25
-
b30
-
generic
-
generic
-
Verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8013683 | 8 | Roland Westrelin | P2 | Closed | Fixed | b88 |
JDK-8018324 | 7u45 | Roland Westrelin | P2 | Closed | Fixed | b01 |
JDK-8013986 | 7u40 | Roland Westrelin | P2 | Closed | Fixed | b24 |
JDK-8013108 | hs24 | Roland Westrelin | P2 | Closed | Fixed | b42 |
The issue seems to be with 64-bit immediates in the instruction selection.
The attached test case clearly demonstrates the issue.
Running the test shows in OptoAssembly:
4ce B50: # B8 <- B49 B46 B45 Freq: 0.00189711
4ce MEMBAR-release ! (empty encoding)
4ce
4ce movq R10, [rsp + #8] # spill
4d3 ADDQ [[R10 + #152 (32-bit)]],#281474976710656
4dc
4dc MEMBAR-acquire ! (empty encoding)
Note the "proper" immediate "1 << 42" in the ADDQ. However, it gets further lowered to:
;; B49: # B8 <- B48 B45 B44 Freq: 0.00189711
0x00007f1c991d4a5e: mov 0x8(%rsp),%r10
0x00007f1c991d4a63: lock addq $0x0,0x98(%r10)
So we indeed adding zero, instead of the properly-offset bit.
There are two matching rules for this node, xaddL and xaddL_no_res. The problematic rule seems to be xaddL_no_res. If we mess with the code to force selecting the xaddL, then we get the register-based selection, and the proper code:
0x00007fe76d15b703: mov $0x1000000000000,%r10
;; B52: # B8 <- B51 B53 Freq: 0.00189711
0x00007fe76d15b70d: mov 0x8(%rsp),%r11
0x00007fe76d15b712: lock xadd %r10,0x98(%r11)
This opens up the way for workaround: feed the $delta-s that could not be constant-folded. The simplest would be reading the delta from the volatile field.
Speculation: the problem seems to be the *missing* overloaded macros addq(Address addr, int64_t), and we are probably selecting the best addq(Address addr, int32_t), truncating the immediate.
This issue affects jsr166 development, existing classes (AtomicLong), and has potentially large impact.
I/L/W = H/M/M => P2
The attached test case clearly demonstrates the issue.
Running the test shows in OptoAssembly:
4ce B50: # B8 <- B49 B46 B45 Freq: 0.00189711
4ce MEMBAR-release ! (empty encoding)
4ce
4ce movq R10, [rsp + #8] # spill
4d3 ADDQ [[R10 + #152 (32-bit)]],#281474976710656
4dc
4dc MEMBAR-acquire ! (empty encoding)
Note the "proper" immediate "1 << 42" in the ADDQ. However, it gets further lowered to:
;; B49: # B8 <- B48 B45 B44 Freq: 0.00189711
0x00007f1c991d4a5e: mov 0x8(%rsp),%r10
0x00007f1c991d4a63: lock addq $0x0,0x98(%r10)
So we indeed adding zero, instead of the properly-offset bit.
There are two matching rules for this node, xaddL and xaddL_no_res. The problematic rule seems to be xaddL_no_res. If we mess with the code to force selecting the xaddL, then we get the register-based selection, and the proper code:
0x00007fe76d15b703: mov $0x1000000000000,%r10
;; B52: # B8 <- B51 B53 Freq: 0.00189711
0x00007fe76d15b70d: mov 0x8(%rsp),%r11
0x00007fe76d15b712: lock xadd %r10,0x98(%r11)
This opens up the way for workaround: feed the $delta-s that could not be constant-folded. The simplest would be reading the delta from the volatile field.
Speculation: the problem seems to be the *missing* overloaded macros addq(Address addr, int64_t), and we are probably selecting the best addq(Address addr, int32_t), truncating the immediate.
This issue affects jsr166 development, existing classes (AtomicLong), and has potentially large impact.
I/L/W = H/M/M => P2
- backported by
-
JDK-8013108 Unsafe.getAndAddLong(obj, off, delta) does not work properly with long deltas
- Closed
-
JDK-8013683 Unsafe.getAndAddLong(obj, off, delta) does not work properly with long deltas
- Closed
-
JDK-8013986 Unsafe.getAndAddLong(obj, off, delta) does not work properly with long deltas
- Closed
-
JDK-8018324 Unsafe.getAndAddLong(obj, off, delta) does not work properly with long deltas
- Closed
- relates to
-
JDK-7023898 Intrinsify AtomicLongFieldUpdater.getAndIncrement()
- Resolved