Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Unresolved
Priority: P4
Fix Version/s: tbd
Affects Version/s: 24
Component/s: hotspot
Labels:

Subcomponent:
compiler

(Apologies for an overly generic synopsis, we should sharpen this if we find solution/mitigation)

René Schwietzke noticed an interesting behavior in manual arraycopy benchmarks. The simplest reproducer is:

```
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class LoopCounterBench {
    int increment;
    long[] src, dest;

    @Setup
    public void setup() {
        final int SIZE = 1000;
        src = new long[SIZE];
        dest = new long[SIZE];
        increment = 1;
    }

    @Benchmark
    public long[] field_ret() {
        for (int i = 0; i < src.length; i = i + increment) {
            dest[i] = src[i];
        }
        return dest;
    }

    @Benchmark
    public long[] localVar_ret() {
        final int inc = increment;
        for (int i = 0; i < src.length; i = i + inc) {
            dest[i] = src[i];
        }
        return dest;
    }
}
```

...it yields:

```
Benchmark Mode Cnt Score Error Units
LoopCounterBench.field_ret avgt 5 604.758 ± 0.404 ns/op
LoopCounterBench.localVar_ret avgt 5 1625.441 ± 0.503 ns/op
```

This result is counter-intuitive: caching a field value in the local variable is significantly slower than using the field directly. `perfasm` shows the difference between fast and slow version is that slow version has the spills:

```
Fast:
↗ 0x...6d00: cmp %edx,%r8d
│ 0x...6d03: jae 0x...6d27
│ 0x...6d05: mov 0x10(%r13,%r8,8),%rax
│ 0x...6d0a: cmp %esi,%r8d
│ 0x...6d0d: jae 0x...6d60
│ 0x...6d0f: mov %rax,0x10(%r14,%r8,8)
│ 0x...6d14: add %ecx,%r8d
│ 0x...6d17: mov 0x450(%r15),%rax
│ 0x...6d1e: test %eax,(%rax)
│ 0x...6d20: cmp %edx,%r8d
╰ 0x...6d23: jl 0x...6d00

Slow:
↗ 0x...7390: vmovq %xmm0,%rbp ; <--- UNSPILL
│ 0x...7395: cmp %r10d,%edi
│ 0x...7398: jae 0x...7412
│ 0x...739a: vmovq %rbp,%xmm0 ; <--- SPILL
│ 0x...739f: mov 0x10(%rax,%rdi,8),%rbp
│ 0x...73a4: cmp %esi,%edi
│ 0x...73a6: jae 0x...7450
│ 0x...73ac: mov %rbp,0x10(%r13,%rdi,8)
│ 0x...73b1: add %r9d,%edi
│ 0x...73b4: mov 0x450(%r15),%rbp ; <--- %rbp is used for thread-local poll
│ 0x...73bb: test %eax,0x0(%rbp)
│ 0x...73be: cmp %r10d,%edi
╰ 0x...73c1: jl 0x...7390
```

links to

Review(master) openjdk/jdk/21472

Assignee:: Quan Anh Mai

Reporter:: Aleksey Shipilev

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024-10-08 02:30

Updated:: 2025-05-28 06:33

Details

Description

Attachments

Issue Links

Activity

People

Dates