-
Bug
-
Resolution: Duplicate
-
P3
-
20
-
x86
There is a work-in-progress RFE for ChaCha20 intrinsics (https://github.com/openjdk/jdk/pull/7702) which appears to be causing an issue with the C2 intermediate representation. This appears to happen under specific conditions:
* The system must be x86_64 and have avx512f support
* This has only been seen on linux, other x86 platforms are unknown at this time.
* -XX:UseAVX=3 must be employed (we have not seen the error without it or with -XX:UseAVX=1 or 2)
* It can be worked around with the inclusion of -XX:+UnlockDiagnosticVMOptions -XX:ArrayOperationPartialInlineSize=0
This so far has only been seen when running test/micro/org/openjdk/bench/javax/crypto/full/CipherBench.java, specifically the ChaCha20Poly1305.decrypt benchmark. This is also without the pending Poly1305 intrinsics which are also in review at this time.
Info for the system used to reproduce this issue:
Linux <hostname-omitted> 4.14.35-2047.515.3.el7uek.x86_64 #2 SMP Thu Jun 30 18:46:19 PDT 2022 x86_64 x86_64 x86_64 GNU/Linux
Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke md_clear
Sample invocation:
<OPEN-REPO-TOP>/build/linux-x64/jdk/bin/java -XX:UseAVX=3 -jar <OPEN-REPO-TOP>/build/linux-x64/images/test/micro/benchmarks.jar ChaCha20Poly1305.decrypt
# JMH version: 1.34
# VM version: JDK 20-internal, OpenJDK 64-Bit Server VM, 20-internal-2022-11-01-1746088.jjnimeh...
# VM invoker: /home/jjnimeh/workspace/jdk-intrin/build/linux-x64/jdk/bin/java
# VM options: -XX:UseAVX=3 -XX:+AlwaysPreTouch
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 3 iterations, 3 s each
# Measurement: 8 iterations, 2 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.openjdk.bench.javax.crypto.full.CipherBench.ChaCha20Poly1305.decrypt
# Parameters: (dataSize = 256, keyLength = 256, mode = None, padding = NoPadding, permutation = ChaCha20-Poly1305, provider = )
# Run progress: 0.00% complete, ETA 00:10:25
# Fork: 1 of 5
# Warmup Iteration 1: !!! Unschedulable graph !!!
B156 idom=B153 depth=26 321 vmasked_store_evex === 1121 310 315 322 323 [[ 320 473 ]] memory Memory: @byte[int:32]:NotNull:exact[0] *,iid=1079, idx=39;
B59 idom=B52 depth=20 442 addP_rReg_imm === _ 311 311 [[ 443 415 447 473 478 479 ]] byte[int:32]:NotNull:exact[0] *,iid=1079
B144 idom=B143 depth=31 474 vmask_gen === _ 475 477 [[ 473 478 479 470 ]] vectormask[64]:{byte}
Failing node: 473 vmasked_load_evex === _ 321 442 474 [[ 472 ]] vectory[32]:{byte}
The hs_err_pid files from the above run for all four data sizes (256, 1024, 4096, 16384) have been attached to this issue.
It is believed that this may be related toJDK-8252848 or JDK-8266951. Investigations thus far point away from the new stubs themselves.
* The system must be x86_64 and have avx512f support
* This has only been seen on linux, other x86 platforms are unknown at this time.
* -XX:UseAVX=3 must be employed (we have not seen the error without it or with -XX:UseAVX=1 or 2)
* It can be worked around with the inclusion of -XX:+UnlockDiagnosticVMOptions -XX:ArrayOperationPartialInlineSize=0
This so far has only been seen when running test/micro/org/openjdk/bench/javax/crypto/full/CipherBench.java, specifically the ChaCha20Poly1305.decrypt benchmark. This is also without the pending Poly1305 intrinsics which are also in review at this time.
Info for the system used to reproduce this issue:
Linux <hostname-omitted> 4.14.35-2047.515.3.el7uek.x86_64 #2 SMP Thu Jun 30 18:46:19 PDT 2022 x86_64 x86_64 x86_64 GNU/Linux
Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke md_clear
Sample invocation:
<OPEN-REPO-TOP>/build/linux-x64/jdk/bin/java -XX:UseAVX=3 -jar <OPEN-REPO-TOP>/build/linux-x64/images/test/micro/benchmarks.jar ChaCha20Poly1305.decrypt
# JMH version: 1.34
# VM version: JDK 20-internal, OpenJDK 64-Bit Server VM, 20-internal-2022-11-01-1746088.jjnimeh...
# VM invoker: /home/jjnimeh/workspace/jdk-intrin/build/linux-x64/jdk/bin/java
# VM options: -XX:UseAVX=3 -XX:+AlwaysPreTouch
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 3 iterations, 3 s each
# Measurement: 8 iterations, 2 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.openjdk.bench.javax.crypto.full.CipherBench.ChaCha20Poly1305.decrypt
# Parameters: (dataSize = 256, keyLength = 256, mode = None, padding = NoPadding, permutation = ChaCha20-Poly1305, provider = )
# Run progress: 0.00% complete, ETA 00:10:25
# Fork: 1 of 5
# Warmup Iteration 1: !!! Unschedulable graph !!!
B156 idom=B153 depth=26 321 vmasked_store_evex === 1121 310 315 322 323 [[ 320 473 ]] memory Memory: @byte[int:32]:NotNull:exact[0] *,iid=1079, idx=39;
B59 idom=B52 depth=20 442 addP_rReg_imm === _ 311 311 [[ 443 415 447 473 478 479 ]] byte[int:32]:NotNull:exact[0] *,iid=1079
B144 idom=B143 depth=31 474 vmask_gen === _ 475 477 [[ 473 478 479 470 ]] vectormask[64]:{byte}
Failing node: 473 vmasked_load_evex === _ 321 442 474 [[ 472 ]] vectory[32]:{byte}
The hs_err_pid files from the above run for all four data sizes (256, 1024, 4096, 16384) have been attached to this issue.
It is believed that this may be related to
- blocks
-
JDK-8247645 ChaCha20 Intrinsics
- Resolved
- duplicates
-
JDK-8292780 misc tests failed "assert(false) failed: graph should be schedulable"
- Resolved