Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 25
Affects Version/s: None
Component/s: security-libs
Labels:
- noreg-perf
- security-performance

Subcomponent:
javax.crypto
Resolved In Build:
b09
CPU:

aarch64
OS:

generic

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8356138	21.0.10-oracle	Konanki Sreenath	P4	Open	Unresolved

On aarch64, the original implementation of the ChaCha20 block function used a block-parallel approach (a single 32-bit state integer was duplicated onto all lanes of a SIMD register, one register per state element), while the x86_64 implementation followed a quarter-round parallel approach (each 128-bit segment of the 512-bit state is held on 4 contiguous SIMD registers).

Profiling just the keystream generation function in assembly on aarch64 shows roughly an 11% speed gain using the quarter-round parallel version over the block-parallel. When placed into an intrinsic and used for a complete ChaCha20 encryption or decryption operation, the speed gains suggest a modest 2-4% speed increase, depending on the input size.

The plan is to move to the quarter-round parallel implementation in order to take advantage of this speed increase.

backported by

JDK-8356138 Change ChaCha20 intrinsic to use quarter-round parallel implementation on aarch64

Open

causes

JDK-8350126 Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64

Resolved

links to

Commit(master) openjdk/jdk/ee4caa41

Review(master) openjdk/jdk/23397

Assignee:: Jamil Nimeh

Reporter:: Jamil Nimeh

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025-01-30 14:29

Updated:: 2025-05-04 22:42

Resolved:: 2025-02-04 08:31

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates