ModI/LNode::Ideal implements an optimization that (repeatedly) applies And, RShift, and Add operations if the divisor is a constant of the form 2^k - 1. The number of repetitions needed (unroll_factor) depends on the divisor and is currently limited to 5 repetitions (above, the optimization isn't applied).
In a fixup step, the optimization also produces multiple comparisons and conditional moves.
Depending on the hardware, the optimization can yield worse performance, so we should evaluate whether the unroll limit of 5 is appropriate or if the optimization is worth to keep around in general.
I'm attaching a microbenchmark that covers all Mersenne numbers for int and long. The optimization can be disabled using -XX:ConditionalMoveLimit=0.
I'm also attaching the results measured on my machine (Ryzen 9 3900X) where the optimization is always worse than the more general div by constant optimization. On aarch64 (Neoverse-N1), the results are less conclusive, i.e., a low unroll factor provides better results.
In a fixup step, the optimization also produces multiple comparisons and conditional moves.
Depending on the hardware, the optimization can yield worse performance, so we should evaluate whether the unroll limit of 5 is appropriate or if the optimization is worth to keep around in general.
I'm attaching a microbenchmark that covers all Mersenne numbers for int and long. The optimization can be disabled using -XX:ConditionalMoveLimit=0.
I'm also attaching the results measured on my machine (Ryzen 9 3900X) where the optimization is always worse than the more general div by constant optimization. On aarch64 (Neoverse-N1), the results are less conclusive, i.e., a low unroll factor provides better results.
- relates to
-
JDK-8366815 C2: Delay Mod/Div by constant transformation
-
- Open
-