The current implementation of Long.remainderUnsigned appeals to Hackers Delight 9-3, but I think in part it falls short of using Warren's suggested technique, of halving the dividend and doubling the result (with a fixup).
Long.remainderUnsigned, as a stand-alone operation, performs an unsigned division, and then works backwards with a multiply to find out the remainder. It can instead use Warren's technique directly. Cut the dividend in half (`>>>1`), take the *remainder* (not the *quotient*, as in the current code), and then repair the approximate result by doubling it and adding the shifted-out bit. Overflow can be avoided by using available signed-negative values. The existing flow-free coding style can still be used to perform the fixup.
Here is a sketch of the code which avoids the extra multiply by working directly toward the remainder:
```
assert(dividend >= 0); // negative dividend handled elsewhere as today
long r = ((dividend >>> 1) % divisor) << 1; // use % not / here
r += (dividend & 1); // reintroduce the shifted-out bit
r -= divisor; // cancel impending overflow; set bit 63 only for a true negative
r += ((r >> (Long.SIZE - 1)) & divisor); // increment by divisor if negative
assert(r >= 0 && r < dividend); // must be
return r;
```
The above technique benefits platforms which do not have AD file support for unsigned remainder (UModL).
For platforms which *do* support UModL directly, the assembly code performs a runtime sign check which could in principle be avoided if static type analysis proves it is never taken.
This can be done in part by enhancing `UModLNode::Ideal` to replace the current node by a *signed* mod node, when the inputs are both always sign-positive. This would then unlock further optimizations on the signed mod IR node.
Likewise, `UDivLNode::Ideal` should convert itself to signed div node if the inputs are both sign-positive.
Further more (on top of the previous suggestion) I suggest that the dynamic test in the assembly code be lifted out into IR. That way it can be folded more systematically, and also the Ideal methods just mentioned above will apply more frequently, on the relevant IR paths.
Ultimately, the test for a negative divisor should be represented fully in IR flow or c-moves (for both div and mod cases), so that the AD file can handle the case of only positive divisors. The fancy flow-free code in the assembly code for the negative divisor path should probably be handled by C2's logic for doing flow-free conditionals; that would be a better factoring of concerns than placing little bits of flow-free lore here and there in the assembler.
Long.remainderUnsigned, as a stand-alone operation, performs an unsigned division, and then works backwards with a multiply to find out the remainder. It can instead use Warren's technique directly. Cut the dividend in half (`>>>1`), take the *remainder* (not the *quotient*, as in the current code), and then repair the approximate result by doubling it and adding the shifted-out bit. Overflow can be avoided by using available signed-negative values. The existing flow-free coding style can still be used to perform the fixup.
Here is a sketch of the code which avoids the extra multiply by working directly toward the remainder:
```
assert(dividend >= 0); // negative dividend handled elsewhere as today
long r = ((dividend >>> 1) % divisor) << 1; // use % not / here
r += (dividend & 1); // reintroduce the shifted-out bit
r -= divisor; // cancel impending overflow; set bit 63 only for a true negative
r += ((r >> (Long.SIZE - 1)) & divisor); // increment by divisor if negative
assert(r >= 0 && r < dividend); // must be
return r;
```
The above technique benefits platforms which do not have AD file support for unsigned remainder (UModL).
For platforms which *do* support UModL directly, the assembly code performs a runtime sign check which could in principle be avoided if static type analysis proves it is never taken.
This can be done in part by enhancing `UModLNode::Ideal` to replace the current node by a *signed* mod node, when the inputs are both always sign-positive. This would then unlock further optimizations on the signed mod IR node.
Likewise, `UDivLNode::Ideal` should convert itself to signed div node if the inputs are both sign-positive.
Further more (on top of the previous suggestion) I suggest that the dynamic test in the assembly code be lifted out into IR. That way it can be folded more systematically, and also the Ideal methods just mentioned above will apply more frequently, on the relevant IR paths.
Ultimately, the test for a negative divisor should be represented fully in IR flow or c-moves (for both div and mod cases), so that the AD file can handle the case of only positive divisors. The fancy flow-free code in the assembly code for the negative divisor path should probably be handled by C2's logic for doing flow-free conditionals; that would be a better factoring of concerns than placing little bits of flow-free lore here and there in the assembler.
- relates to
- 
                    JDK-8298858 Better j.l.Long.remainderUnsigned() -           
- In Progress
 
-         
- 
                    JDK-8286679 more range check elimination for hash tables -           
- Open
 
-