-
Bug
-
Resolution: Unresolved
-
P3
-
24
Newly implemented AES/GCM intrinsic with AVX-512 VAES (JDK-8337632) apparently has a subtle out-of-bounds access that may crash the JVM in limited conditions. It manifests like a crash in `StubRoutines::galoisCounterMode_AESCrypt`:
```
# SIGSEGV (0xb) at pc=0x00007f6e204dbceb, pid=7, tid=1254
#
# JRE version: OpenJDK Runtime Environment Corretto-24.0.2.12.1 (24.0.2+12) (build 24.0.2+12-FR)
# Java VM: OpenJDK 64-Bit Server VM Corretto-24.0.2.12.1 (24.0.2+12-FR, mixed mode, tiered, compressed class ptrs, z gc, linux-amd64)
# Problematic frame:
# v ~StubRoutines::galoisCounterMode_AESCrypt 0x00007f6e204dbceb
```
The crash happens here:
```
vaesenc %zmm31, %zmm3, %zmm3
vaesenc %zmm31, %zmm4, %zmm4
vaesenc %zmm31, %zmm5, %zmm5
>>> vmovdqu32 176(%r8), %xmm31
vpshufb 299219332(%rip), %xmm31, %xmm31
vshufi64x2 $0, %zmm31, %zmm31, %zmm31
cmpl $52, %r15d
jl 178
```
hs_err says R8 is:
```
R8 =0x00000400957fff48 is a zaddress: Unreliable Internal address
0x00000400956c3b00 is a zaddress: Unreliable Bad mark info/base
```
BUT, 176(r8) is 0x957FFFF8, and we fail at:
```
siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x0000040095800000
```
...which is in uncommitted part of the heap:
```
40095600000-40095800000 rw-s 86600000 00:01 12291 /memfd:java_heap (deleted)
40095800000-40095a00000 ---p 00000000 00:00 0
```
So it superficially looks likeJDK-8330611, but for AES-GCM. The symptoms are similar, and the cause is similar as well: there is an out-of-bounds access, which rarely hits uncommitted parts of the heap, at which point JVM crashes.
I believe this code is from here: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp#L3207-L3211:
```
roundEncode(AESKEY2, B00_03, B04_07, B08_11, B12_15);
>>> ev_load_key(AESKEY2, key, 11 * 16, rbx);
//AES rounds up to 11 (AES192) or 13 (AES256)
//AES128 is done
__ cmpl(NROUNDS, 52);
__ jcc(Assembler::less, last_aes_rnd);
__ bind(aes_192);
roundEncode(AESKEY1, B00_03, B04_07, B08_11, B12_15);
ev_load_key(AESKEY1, key, 12 * 16, rbx);
roundEncode(AESKEY2, B00_03, B04_07, B08_11, B12_15);
```
So we are running out of bounds trying to access key at offset 11*16 = 176 for AESKEY2 load. But this makes no sense for AES-128, we are already exiting at NROUNDS=52 check for the last_aes_rnd, and AESKEY2 is not used on this path. So we end up loading AESKEY2 for no reason. This look benign for correctness, as whatever garbage we read in AESKEY2 for AES-128 is not going to affect any computations. It is fairly reasonable from the stub code, but I also verified it separately by summarily vpxor-ing AESKEY2 at last_aes_rnd branch, without any test failures. So the ciphers actually work well either way, which also explains no test failures anywhere since initialJDK-8337632 push.
The only observable effect is when the key array is the edge of committed heap, then we can SEGV accessing it. And that is rather rare, which explains why we do not see it even with aggressive GC testing. I believe the fix is "just" moving the load below to AES192/256 block; this would also match what AVX2 version of intrinsic does. I added the runtime bounds check that reproduces the issue cleanly, and there are no other out-of-bounds problems accessing key data, except for this single one.
```
# SIGSEGV (0xb) at pc=0x00007f6e204dbceb, pid=7, tid=1254
#
# JRE version: OpenJDK Runtime Environment Corretto-24.0.2.12.1 (24.0.2+12) (build 24.0.2+12-FR)
# Java VM: OpenJDK 64-Bit Server VM Corretto-24.0.2.12.1 (24.0.2+12-FR, mixed mode, tiered, compressed class ptrs, z gc, linux-amd64)
# Problematic frame:
# v ~StubRoutines::galoisCounterMode_AESCrypt 0x00007f6e204dbceb
```
The crash happens here:
```
vaesenc %zmm31, %zmm3, %zmm3
vaesenc %zmm31, %zmm4, %zmm4
vaesenc %zmm31, %zmm5, %zmm5
>>> vmovdqu32 176(%r8), %xmm31
vpshufb 299219332(%rip), %xmm31, %xmm31
vshufi64x2 $0, %zmm31, %zmm31, %zmm31
cmpl $52, %r15d
jl 178
```
hs_err says R8 is:
```
R8 =0x00000400957fff48 is a zaddress: Unreliable Internal address
0x00000400956c3b00 is a zaddress: Unreliable Bad mark info/base
```
BUT, 176(r8) is 0x957FFFF8, and we fail at:
```
siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x0000040095800000
```
...which is in uncommitted part of the heap:
```
40095600000-40095800000 rw-s 86600000 00:01 12291 /memfd:java_heap (deleted)
40095800000-40095a00000 ---p 00000000 00:00 0
```
So it superficially looks like
I believe this code is from here: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp#L3207-L3211:
```
roundEncode(AESKEY2, B00_03, B04_07, B08_11, B12_15);
>>> ev_load_key(AESKEY2, key, 11 * 16, rbx);
//AES rounds up to 11 (AES192) or 13 (AES256)
//AES128 is done
__ cmpl(NROUNDS, 52);
__ jcc(Assembler::less, last_aes_rnd);
__ bind(aes_192);
roundEncode(AESKEY1, B00_03, B04_07, B08_11, B12_15);
ev_load_key(AESKEY1, key, 12 * 16, rbx);
roundEncode(AESKEY2, B00_03, B04_07, B08_11, B12_15);
```
So we are running out of bounds trying to access key at offset 11*16 = 176 for AESKEY2 load. But this makes no sense for AES-128, we are already exiting at NROUNDS=52 check for the last_aes_rnd, and AESKEY2 is not used on this path. So we end up loading AESKEY2 for no reason. This look benign for correctness, as whatever garbage we read in AESKEY2 for AES-128 is not going to affect any computations. It is fairly reasonable from the stub code, but I also verified it separately by summarily vpxor-ing AESKEY2 at last_aes_rnd branch, without any test failures. So the ciphers actually work well either way, which also explains no test failures anywhere since initial
The only observable effect is when the key array is the edge of committed heap, then we can SEGV accessing it. And that is rather rare, which explains why we do not see it even with aggressive GC testing. I believe the fix is "just" moving the load below to AES192/256 block; this would also match what AVX2 version of intrinsic does. I added the runtime bounds check that reproduces the issue cleanly, and there are no other out-of-bounds problems accessing key data, except for this single one.
- caused by
-
JDK-8337632 AES-GCM Algorithm optimization for x86_64
-
- Resolved
-
- links to
-
Review(master) openjdk/jdk/27951