Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Unresolved
Priority: P4
Fix Version/s: tbd
Affects Version/s: 25
Component/s: hotspot
Labels:
- c2
- performance

Subcomponent:
compiler

Constant multiplication `x*C` can be optimized to cheaper IRs like add or shift. For example:
1. x*8 can be optimized as x<<3.
2. x*9 can be optimized as x+x<<3, and x+x<<3 can be lowered as one ADD-SHIFT instruction on some architectures, like aarch64 and x86_64.

Currently C2 implemented a few such patterns in mid-end, including:
1. |C| = 1<<n (n>0)
2. |C| = (1<<n) - 1 (n>0)
3. |C| = (1<<m) + (1<<n) (m>n, n>=0)

The first two are ok. Because on most architectures they are lowered as
only one ADD/SUB/SHIFT instruction.

But the third pattern doesn't always perform well on some architectures like AArch64. According to the Arm optimization guide, if the shift amount > 4, the latency and throughput of ADD instruction is the same with MUL instruction. In this case, converting MUL to ADD is not profitable. Hence, adding such transformation in mid-end IR level may get performance regression for some cases.

links to

Review(master) openjdk/jdk/22922

Assignee:: Xiaohong Gong

Reporter:: Xiaohong Gong

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025-01-02 23:10

Updated:: 2025-02-10 13:24

Details

Description

Attachments

Issue Links

Activity

People

Dates