C2: optimize mask checks in counted loops

XMLWordPrintable

    • Type: Enhancement
    • Resolution: Fixed
    • Priority: P4
    • 18
    • Affects Version/s: 17, 18
    • Component/s: hotspot
    • b28

        The memory access API supports custom alignment constraints, which are checked upon memory access, using the following formula:

        ((segmentBaseAddress + accessedOffset) & alignmentMask) == 0

        However, when accessing a segment using a var handle obtained from a layout featuring a non-trivial alignment mask, access performance is slower than in the case where the alignment mask is 0.

        The attached patch adds a benchmark which shows the problem; the benchmark compares accessing a segment using a 4-byte aligned vs. a 1-byte aligned layout:

        ```
        Benchmark Mode Cnt Score Error Units
        LoopOverNonConstant.segment_loop_instance_index avgt 30 0.229 ? 0.001 ms/op
        LoopOverNonConstant.segment_loop_instance_index_aligned avgt 30 0.329 ? 0.005 ms/op
        ```

        As it can be seen, access with alignment constraints is slower.

        This is mildly surprising - after all, in the above formula, segmentBaseAddress is a loop invariant - whereas accessedOffset typically depends on the loop variable, so existing BCE logic should kick in and detect that the offset is always aligned (given the loop stride).

              Assignee:
              Roland Westrelin
              Reporter:
              Maurizio Cimadamore
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: