Description
The current implementation of Long.remainderUnsigned appeals to Hackers Delight 93, but I think in part it falls short of using Warren's suggested technique, of halving the dividend and doubling the result (with a fixup).
Long.remainderUnsigned, as a standalone operation, performs an unsigned division, and then works backwards with a multiply to find out the remainder. It can instead use Warren's technique directly. Cut the dividend in half (`>>>1`), take the *remainder* (not the *quotient*, as in the current code), and then repair the approximate result by doubling it and adding the shiftedout bit. Overflow can be avoided by using available signednegative values. The existing flowfree coding style can still be used to perform the fixup.
Here is a sketch of the code which avoids the extra multiply by working directly toward the remainder:
```
assert(dividend >= 0); // negative dividend handled elsewhere as today
long r = ((dividend >>> 1) % divisor) << 1; // use % not / here
r += (dividend & 1); // reintroduce the shiftedout bit
r = divisor; // cancel impending overflow; set bit 63 only for a true negative
r += ((r >> (Long.SIZE  1)) & divisor); // increment by divisor if negative
assert(r >= 0 && r < dividend); // must be
return r;
```
The above technique benefits platforms which do not have AD file support for unsigned remainder (UModL).
For platforms which *do* support UModL directly, the assembly code performs a runtime sign check which could in principle be avoided if static type analysis proves it is never taken.
This can be done in part by enhancing `UModLNode::Ideal` to replace the current node by a *signed* mod node, when the inputs are both always signpositive. This would then unlock further optimizations on the signed mod IR node.
Likewise, `UDivLNode::Ideal` should convert itself to signed div node if the inputs are both signpositive.
Further more (on top of the previous suggestion) I suggest that the dynamic test in the assembly code be lifted out into IR. That way it can be folded more systematically, and also the Ideal methods just mentioned above will apply more frequently, on the relevant IR paths.
Ultimately, the test for a negative divisor should be represented fully in IR flow or cmoves (for both div and mod cases), so that the AD file can handle the case of only positive divisors. The fancy flowfree code in the assembly code for the negative divisor path should probably be handled by C2's logic for doing flowfree conditionals; that would be a better factoring of concerns than placing little bits of flowfree lore here and there in the assembler.
Long.remainderUnsigned, as a standalone operation, performs an unsigned division, and then works backwards with a multiply to find out the remainder. It can instead use Warren's technique directly. Cut the dividend in half (`>>>1`), take the *remainder* (not the *quotient*, as in the current code), and then repair the approximate result by doubling it and adding the shiftedout bit. Overflow can be avoided by using available signednegative values. The existing flowfree coding style can still be used to perform the fixup.
Here is a sketch of the code which avoids the extra multiply by working directly toward the remainder:
```
assert(dividend >= 0); // negative dividend handled elsewhere as today
long r = ((dividend >>> 1) % divisor) << 1; // use % not / here
r += (dividend & 1); // reintroduce the shiftedout bit
r = divisor; // cancel impending overflow; set bit 63 only for a true negative
r += ((r >> (Long.SIZE  1)) & divisor); // increment by divisor if negative
assert(r >= 0 && r < dividend); // must be
return r;
```
The above technique benefits platforms which do not have AD file support for unsigned remainder (UModL).
For platforms which *do* support UModL directly, the assembly code performs a runtime sign check which could in principle be avoided if static type analysis proves it is never taken.
This can be done in part by enhancing `UModLNode::Ideal` to replace the current node by a *signed* mod node, when the inputs are both always signpositive. This would then unlock further optimizations on the signed mod IR node.
Likewise, `UDivLNode::Ideal` should convert itself to signed div node if the inputs are both signpositive.
Further more (on top of the previous suggestion) I suggest that the dynamic test in the assembly code be lifted out into IR. That way it can be folded more systematically, and also the Ideal methods just mentioned above will apply more frequently, on the relevant IR paths.
Ultimately, the test for a negative divisor should be represented fully in IR flow or cmoves (for both div and mod cases), so that the AD file can handle the case of only positive divisors. The fancy flowfree code in the assembly code for the negative divisor path should probably be handled by C2's logic for doing flowfree conditionals; that would be a better factoring of concerns than placing little bits of flowfree lore here and there in the assembler.
Attachments
Issue Links
 relates to

JDK8298858 Better j.l.Long.remainderUnsigned()
 In Progress

JDK8286679 more range check elimination for hash tables
 Open