-
Type:
Enhancement
-
Resolution: Fixed
-
Priority:
P4
-
Affects Version/s: None
-
Component/s: security-libs
-
None
-
master
-
generic
-
generic
Create on behalf wuxinyang@hygon.cn
The current implementation of AES in ECB mode still uses a per-block intrinsic approach with loop invocation, incurring superfluous invocations and context-switching overhead. We suggest introducing a full plaintext/ciphertext intrinsic stub and further optimizing it with parallel RoundKey addition.
===========================
Dear Security group and members,
Hello,
I recently submitted a PR that introduces a parallel intrinsic implementation for AES/ECB operations, aiming to replace the current per-block processing approach and improve performance for multi-block encryption/decryption.
This work is motivated by several performance limitations in the existing AES/ECB implementation (except for AVX-512 support):
1.
*Excessive stub call overhead* ? each 16-byte block triggers a separate
intrinsic call, leading to high invocation frequency.
2.
*Limited instruction-level parallelism* ? serialized block processing
does not fully utilize available ILP.
3.
*Redundant setup and teardown* ? encryption state is repeatedly
initialized for every block.
Summary of changes
-
Added a parallel AES intrinsic implementation to process multiple blocks
in a single native call.
-
Reduced intrinsic invocation overhead.
-
Improved utilization of instruction-level parallelism.
Performance results (JMH)
Test platform: Intel(R) Core(TM) i9-14900HX OpenJDK 17 baseline:
Benchmark Mode Cnt Score Error Units
AesTest.test avgt 5 13334.163 ? 220.891 ns/op
With optimized implementation:
Benchmark Mode Cnt Score Error Units
AesTest.test avgt 5 10391.371 ? 94.966 ns/op
This shows approximately *28.3% performance improvement*.
I would greatly appreciate your feedback on:
-
The design of the parallel intrinsic approach
-
Any potential correctness or portability concerns
-
Suggestions for further optimization or alignment with HotSpot intrinsic
conventions
JBS Issue: https://bugs.openjdk.org/browse/JDK-8376164 ? This issue tracks the performance improvement of AES/ECB operations by introducing a parallel intrinsic to reduce per-block overhead and enhance throughput.
I am very happy to revise or extend the patch based on your guidance.
Thank you for your time and for maintaining such a great platform.
Best regards,
Xinyang Wu
The current implementation of AES in ECB mode still uses a per-block intrinsic approach with loop invocation, incurring superfluous invocations and context-switching overhead. We suggest introducing a full plaintext/ciphertext intrinsic stub and further optimizing it with parallel RoundKey addition.
===========================
Dear Security group and members,
Hello,
I recently submitted a PR that introduces a parallel intrinsic implementation for AES/ECB operations, aiming to replace the current per-block processing approach and improve performance for multi-block encryption/decryption.
This work is motivated by several performance limitations in the existing AES/ECB implementation (except for AVX-512 support):
1.
*Excessive stub call overhead* ? each 16-byte block triggers a separate
intrinsic call, leading to high invocation frequency.
2.
*Limited instruction-level parallelism* ? serialized block processing
does not fully utilize available ILP.
3.
*Redundant setup and teardown* ? encryption state is repeatedly
initialized for every block.
Summary of changes
-
Added a parallel AES intrinsic implementation to process multiple blocks
in a single native call.
-
Reduced intrinsic invocation overhead.
-
Improved utilization of instruction-level parallelism.
Performance results (JMH)
Test platform: Intel(R) Core(TM) i9-14900HX OpenJDK 17 baseline:
Benchmark Mode Cnt Score Error Units
AesTest.test avgt 5 13334.163 ? 220.891 ns/op
With optimized implementation:
Benchmark Mode Cnt Score Error Units
AesTest.test avgt 5 10391.371 ? 94.966 ns/op
This shows approximately *28.3% performance improvement*.
I would greatly appreciate your feedback on:
-
The design of the parallel intrinsic approach
-
Any potential correctness or portability concerns
-
Suggestions for further optimization or alignment with HotSpot intrinsic
conventions
JBS Issue: https://bugs.openjdk.org/browse/JDK-8376164 ? This issue tracks the performance improvement of AES/ECB operations by introducing a parallel intrinsic to reduce per-block overhead and enhance throughput.
I am very happy to revise or extend the patch based on your guidance.
Thank you for your time and for maintaining such a great platform.
Best regards,
Xinyang Wu
- links to
-
Commit(master)
openjdk/jdk/3e9fc5d4
-
Review(master)
openjdk/jdk/29385