Loading...

Type: Enhancement
Resolution: Unresolved
Priority: P4
Fix Version/s: tbd
Affects Version/s: 19
Component/s: hotspot
Labels:
- c2-intrinsic

Subcomponent:
compiler

There are several hash table lookup idioms which are easily provable NOT to need a range check. They should all be covered by C2.

Given an array A (or other indexed item such as a list or memory segment), and a quasi-random int or long value X, and an int or long array size L = A.length (or A.size(), etc.), the lookup idiom computes the value I = X mod L (using unsigned arithmetic if needed). It then immediately uses the reduced value I to index into the array A. Usually, no range check is needed to ensure that the array (or other aggregate) is correctly indexed.

Case 0. Native mod operator, int type: var x = A[X % A.length]. In this case, X must not be negative. This case is covered by the C2 function CmpUNode::is_index_range_check. We probably also cover the case of A being a list or other int-indexed aggregate (this should be checked).

Case 1. Native mod operator, long type: var x = A.getElement(X % A.elementCount()). Nowadays we also range-check longs in many places, and should do so here as well. An additional challenge may appear if A has an internally unscaled bound (such as a byte-wise size for a memory segment) but the array-like access is scald to a multi-byte element.

Case 2. Strength-reduced mod of power of two: var x = A[X & A.length-1]. It is not clear how well this is covered by existing C2 code. It is a hot path in HashMap, with a native array and int index. The compiler must (in general) insert a speculative predicate that the length of the array is a non-zero, since (only in this case) the hack will fail (and the range check must always fail). Oddly enough, the JIT does NOT need to check that the array length is a power of two: That can be left to the programmer, because of the mathematical identity that an arbitrary number masked by a non-negative number (such as A.length-1) is always numerically less than or equal to the mask, so we can conclude 0 <= (X&(L-1)) < L as long as L>0 (whether or not L is some 2^M). This should be covered for ints and longs, and all kinds of aggregates. Many modern hashing structures will benefit from this particular case being well optimized.

Case 3. Unsigned mod method: var x = A[Integer.remainderUnsigned(X, A.length)].In the case of explicit unsigned arithmetic, the RCE can be removed without any predication at all, as long the subsequent range check uses compatible unsigned arithmetic (int or long). The problem here is recognizing the unsigned remainder operation for what it is (since it is not an intrinsic, and has no special IR node). This can be done easily in the 32-bit case, since it is coded using 64-bit signed arithmetic. The implementation of Integer.remainderUnsigned(X,L) is problematic for X<0, so people are not likely to be using it. Still, some thought should be give to at least preparing to optimize this case.

Case 4. Fixed-point multiply by table size. var x = A[(R(X) * A.length) >>> S] for appropriate R and S. This is a close cousin to A[X & (A.length-1)]. The value R(X) is derived from X by some arbitrary means, but the range of R is constrained to be a number less than 1<<S. The multiply (and subsequent shift) can be either 32-bit or 64-bit. The result R(X)*A.length should be viewed as an unsigned fixed-point number with S bits of sign. Unless A.length is zero (that special case again!), and if the arithmetic is unaffected by sign bits, and there is no hidden rounding upwards, and R(X) is known to be strictly less than 1.0 (viewed as an unsigned fixed-point number), then R(X) * A.length is (as a fixed-point number) strictly less than the math value A.length (A.length << S, in fixed point). The final down-shift by S ensures that the array index expression is always in range. A pattern match in C2 for the above conditions can elide a range check. As a simple example, if A.length=10, and S=32, and R(X) is a 32-bit (probably randomized) int, and R(X) * (A.length=10) is performed using 64-bit multiply (after promoting R(X) without sign extension), then R(X) * 10 is strictly less than 10L<<32, and (R(X) * 10) >>> 32 (or >> 32) is a valid index into the 10-element array.

While it is not clear if anyone is using any of these techniques, they should be optimized if it WOULD be profitable for some code (like HashMap) to BEGIN using them, and/or if such code DID begin doing do.

Editorial: There is something of the Train Station Paradox here, where business party A says, "I'm not going to put a train station there because nobody is waiting for a train" while business party B says, "I'm not going to wait for a train at that place because the train never stops there", and despite the profitability of a train stopping there, nothing gets done.

relates to

JDK-8286795 better 64-bit unsigned remainder

Open

JDK-8003585 strength reduce or eliminate range checks for power-of-two sized arrays

Resolved

Details

Description

Attachments

Issue Links

Activity

People

Dates