Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P4
Fix Version/s: hs25
Affects Version/s: hs25
Component/s: hotspot
Labels:
- hgupdate-sync
- sqe-8-noreglabel-backlog-startfresh

Subcomponent:
compiler
Resolved In Build:
b26
CPU:

x86
OS:

generic

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8011835	8	Vladimir Kozlov	P4	Resolved	Fixed	b85
JDK-8018316	7u45	Vladimir Kozlov	P4	Closed	Fixed	b01
JDK-8013978	7u40	Vladimir Kozlov	P4	Resolved	Fixed	b24
JDK-8013100	hs24	Vladimir Kozlov	P4	Resolved	Fixed	b42

A native library may use wide 256bit YMM registers and does not clean them after that.
Add vzeroupper instruction after return from JNI call to avoid SSE <-> AVX transaction penalty.

From customer report:

Hello,
I've got a question related to my project. It is Java wrapper for Libav libraries and I have some performance issues with it.

If I compile the libraries with AVX instructions enabled the whole testing application uses approximately 130% of CPU time in comparison to the same libraries with AVX disabled. The problem is definitely in "bad" transitions between SSE and AVX instructions. These transitions are costly in case the upper part of YMM registers is not zeroed using VZEROUPPER or VZEROALL instruction before using SSE.

There is no problem with those libraries, if they are not used from Java. I used Intel's Software Developer Emulator to find those bad AVX <-> SSE transitions and I found thousands of them. The origin of almost all bad transitions from AVX -> SSE (I mean the code that uses AVX 256 instructions and does not call VZEROUPPER) is somewhere inside anonymous memory blocks (according to Intel's SDE and pmem).

Libav mixes SSE and AVX 128 instructions a lot. It cannot cause any trouble if the upper part of YMM registers is zeroed. But in case it is not zeroed it would oscillate between B and C states (according to the Agner's terminology). Both of these transitions costs quite a lot of CPU cycles.

So here is my question: Is it possible that JIT compiler compiles some bytecode into native instructions, uses some AVX 256 instrucitons, does not use VZEROUPPER and puts the result into some anonymous memory block?

Ondrej Perutka

backported by

JDK-8011835 Clear AVX registers after return from JNI call

Resolved

JDK-8013100 Clear AVX registers after return from JNI call

Resolved

JDK-8013978 Clear AVX registers after return from JNI call

Resolved

JDK-8018316 Clear AVX registers after return from JNI call

Closed

relates to

JDK-8078113 8011102 changes may cause incorrect results.

Resolved

JDK-8279676 Dubious YMM register clearing in x86_64 arraycopy stubs

Resolved

JDK-8020433 Crash when using -XX:+RestoreMXCSROnJNICalls

Closed

(2 relates to)

Assignee:: Vladimir Kozlov

Reporter:: Vladimir Kozlov

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2013-03-29 14:10

Updated:: 2022-02-06 23:25

Resolved:: 2013-04-03 14:51

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates