Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P2
Fix Version/s: 22
Affects Version/s: 21, 22
Component/s: core-libs
Labels:

Subcomponent:
java.lang.foreign
Resolved In Build:
b25
Verification:
Verified

The attached benchmark exhibits very poor performance when using the MemorySegment API - memory segment are 2x slower compared to semantically equivalent Unsafe code.

While the generated code in the two cases is very similar, the memory segment version seems to have this sequence of instruction at the end of the inner loop:

```
  1.38% 0x00007fcaa44bbaca: mov $0x80,%r11d
            0x00007fcaa44bbad0: cmova %r11d,%r9d ; {no_reloc}
  4.66% 0x00007fcaa44bbad4: lea 0x1(%rsi),%rbp
            0x00007fcaa44bbad8: movslq %r9d,%r10
  4.03% 0x00007fcaa44bbadb: lea 0x1(%rsi,%r10,1),%r11
  4.42% 0x00007fcaa44bbae0: cmp %rbp,%r11
  1.69% 0x00007fcaa44bbae3: movabs $0x7fffffffffffffff,%rax
            0x00007fcaa44bbaed: cmovl %rax,%r11
  8.53% 0x00007fcaa44bbaf1: cmp %rax,%r11
  2.19% 0x00007fcaa44bbaf4: cmovge %rax,%r11
  4.52% 0x00007fcaa44bbaf8: mov %rbp,%rax
            0x00007fcaa44bbafb: xor %r13d,%r13d
            0x00007fcaa44bbafe: test %rbp,%rbp
            0x00007fcaa44bbb01: cmovl %r13,%rax
  0.02% 0x00007fcaa44bbb05: sub %rax,%rsi
            0x00007fcaa44bbb08: cmp %r11,%rax
  4.62% 0x00007fcaa44bbb0b: mov %rax,%r13
            0x00007fcaa44bbb0e: cmovl %r11,%r13
  4.20% 0x00007fcaa44bbb12: sub %rax,%r13
  5.38% 0x00007fcaa44bbb15: mov %esi,%eax
            0x00007fcaa44bbb17: mov %r13d,%esi
            0x00007fcaa44bbb1a: inc %eax
            0x00007fcaa44bbb1c: cmp %esi,%eax
  2.85% 0x00007fcaa44bbb1e: jae 0x00007fcaa44bbcec
            0x00007fcaa44bbb24: vmovq %xmm2,%r11
            0x00007fcaa44bbb29: add %r8,%r11
            0x00007fcaa44bbb2c: mov %r14,%rbp
            0x00007fcaa44bbb2f: add %r8,%rbp
            0x00007fcaa44bbb32: movsbl 0x10(%r11),%r8d ;*baload {reexecute=0 rethrow=0 return_oop=0}
                                                                      ; - org.openjdk.bench.java.lang.foreign.BinarySearch::binarySearch_panama@121 (line 98)
                                                                      ; - org.openjdk.bench.java.lang.foreign.jmh_generated.BinarySearch_binarySearch_panama_jmhTest::binarySearch_panama_avgt_jmhStub@17 (line 186)
            0x00007fcaa44bbb37: movsbl 0x1(%rbp),%r13d ;*invokevirtual getByte {reexecute=0 rethrow=0 return_oop=0}
                                                                      ; - jdk.internal.misc.ScopedMemoryAccess::getByteInternal@13 (line 528)
                                                                      ; - jdk.internal.misc.ScopedMemoryAccess::getByte@4 (line 516)
```

Which is the likely cause of the performance delta.

This issue was reported here:

https://git.openjdk.org/panama-foreign/pull/844

And then further discussed here:

https://mail.openjdk.org/pipermail/panama-dev/2023-July/019369.html

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

BinarySearch.java
25 kB
2023-07-12 03:38
BinarySearchInstance.java
29 kB
2023-07-14 08:17
BinarySearchMini.java
27 kB
2023-07-14 10:07

links to

Commit openjdk/jdk/129c4708

Review openjdk/jdk/16650

Assignee:: Roland Westrelin
Reporter:: Maurizio Cimadamore
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: 2023-07-12 03:31
Updated:: 2024-01-22 06:43
Resolved:: 2023-11-16 23:55

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates