Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P2
Fix Version/s: 9
Affects Version/s: 9
Component/s: infrastructure
Labels:
- 8u-mach5

Subcomponent:
build
Resolved In Build:
b105

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8246850	emb-8u261	Erik Joelsson	P2	Resolved	Fixed	team

While implementing some performance sensitive logic in Hotspot I noticed that the performance of the generated (c++) code on Solaris-x64 was not as good as it should be. A deeper analysis showed that this is related to a problem with inlining/intrinsifying various memcpy calls. I tracked this down to whether or not the "sysroot" is explicitly included when compiling the c++ file(s).

Specifically, here's a small reproducer:

----
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

uint64_t read_unaligned(void* src) {
  uint64_t tmp;

  memcpy(&tmp, src, sizeof(uint64_t));

  return tmp;
}

When compiled without an explicit sysroot include path like so:

SS12u4-Solaris11u1/SS12u4/bin/CC -m64 -G -xO4 -o libfoo.so unaligned_read.cpp

The resulting assembly looks like this:

0000000000000bf0 <__1cOread_unaligned6Fpv_L_>:
bf0: 55 push %rbp
bf1: 48 8b ec mov %rsp,%rbp
bf4: 48 8b 07 mov (%rdi),%rax
bf7: 48 89 45 f8 mov %rax,-0x8(%rbp)
bfb: 48 8b 45 f8 mov -0x8(%rbp),%rax
bff: c9 leaveq
c00: c3 retq

That is, the compiler has "inlined" memcpy and is just reading the value using a normal mov.

However, when the code is compiled *with* an explicit sysroot include like so:

SS12u4-Solaris11u1/SS12u4/bin/CC -m64 -G -I/opt/jprt/products/P1/SS12u4-Solaris11u1/SS12u4-Solaris11u1/sysroot/usr/include -xO4 -o libfoo.so unaligned_read.cpp

The resulting code looks like this:

0000000000000c40 <__1cOread_unaligned6Fpv_L_>:
c40: 55 push %rbp
c41: 48 8b ec mov %rsp,%rbp
c44: 48 83 ec 10 sub $0x10,%rsp
c48: 48 8b f7 mov %rdi,%rsi
c4b: 48 8d 45 f8 lea -0x8(%rbp),%rax
c4f: 48 8b f8 mov %rax,%rdi
c52: 48 c7 c2 08 00 00 00 mov $0x8,%rdx
c59: e8 8a ff ff ff callq be8 <memcpy@plt>
c5e: 48 8b 45 f8 mov -0x8(%rbp),%rax
c62: c9 leaveq
c63: c3 retq

That is, the memcpy is still there.

The performance difference here is significant, especially if the code in question happens to be in a hot loop.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

unaligned-read-nosysroot.txt
2016-02-02 22:07
5 kB
Mikael Vidstedt
unaligned-read-sysroot.txt
2016-02-02 22:07
5 kB
Mikael Vidstedt
unaligned_read.cpp
2016-02-02 22:07
0.2 kB
Mikael Vidstedt

backported by

JDK-8246850 Suboptimal code generated when setting sysroot include with Solaris Studio

Resolved

Assignee:: Erik Joelsson

Reporter:: Mikael Vidstedt

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2016-02-02 22:01

Updated:: 2023-01-24 00:48

Resolved:: 2016-02-05 00:43

Details

Backports

Description

Attachments

Attachments

Issue Links

Activity

People

Dates