Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8370318

AES-GCM vector intrinsic may read out of bounds (x86_64, AVX-512)

XMLWordPrintable

      Newly implemented AES/GCM intrinsic with AVX-512 VAES (JDK-8337632) apparently has a subtle out-of-bounds access that may crash the JVM in limited conditions. It manifests like a crash in `StubRoutines::galoisCounterMode_AESCrypt`:

      ```
      # SIGSEGV (0xb) at pc=0x00007f6e204dbceb, pid=7, tid=1254
      #
      # JRE version: OpenJDK Runtime Environment Corretto-24.0.2.12.1 (24.0.2+12) (build 24.0.2+12-FR)
      # Java VM: OpenJDK 64-Bit Server VM Corretto-24.0.2.12.1 (24.0.2+12-FR, mixed mode, tiered, compressed class ptrs, z gc, linux-amd64)
      # Problematic frame:
      # v ~StubRoutines::galoisCounterMode_AESCrypt 0x00007f6e204dbceb
      ```

      The crash happens here:

      ```
              vaesenc %zmm31, %zmm3, %zmm3
              vaesenc %zmm31, %zmm4, %zmm4
              vaesenc %zmm31, %zmm5, %zmm5
         >>> vmovdqu32 176(%r8), %xmm31
              vpshufb 299219332(%rip), %xmm31, %xmm31
              vshufi64x2 $0, %zmm31, %zmm31, %zmm31
              cmpl $52, %r15d
              jl 178
      ```

      hs_err says R8 is:

      ```
      R8 =0x00000400957fff48 is a zaddress: Unreliable Internal address
      0x00000400956c3b00 is a zaddress: Unreliable Bad mark info/base
      ```

      BUT, 176(r8) is 0x957FFFF8, and we fail at:

      ```
      siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x0000040095800000
      ```

      ...which is in uncommitted part of the heap:

      ```
      40095600000-40095800000 rw-s 86600000 00:01 12291 /memfd:java_heap (deleted)
      40095800000-40095a00000 ---p 00000000 00:00 0
      ```

      So it superficially looks like JDK-8330611, but for AES-GCM. The symptoms are similar, and the cause is similar as well: there is an out-of-bounds access, which rarely hits uncommitted parts of the heap, at which point JVM crashes.

      I believe this code is from here: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp#L3207-L3211:

      ```
           roundEncode(AESKEY2, B00_03, B04_07, B08_11, B12_15);
      >>> ev_load_key(AESKEY2, key, 11 * 16, rbx);
           //AES rounds up to 11 (AES192) or 13 (AES256)
           //AES128 is done
           __ cmpl(NROUNDS, 52);
           __ jcc(Assembler::less, last_aes_rnd);
           __ bind(aes_192);
           roundEncode(AESKEY1, B00_03, B04_07, B08_11, B12_15);
           ev_load_key(AESKEY1, key, 12 * 16, rbx);
           roundEncode(AESKEY2, B00_03, B04_07, B08_11, B12_15);
      ```

      So we are running out of bounds trying to access key at offset 11*16 = 176 for AESKEY2 load. But this makes no sense for AES-128, we are already exiting at NROUNDS=52 check for the last_aes_rnd, and AESKEY2 is not used on this path. So we end up loading AESKEY2 for no reason. This look benign for correctness, as whatever garbage we read in AESKEY2 for AES-128 is not going to affect any computations. It is fairly reasonable from the stub code, but I also verified it separately by summarily vpxor-ing AESKEY2 at last_aes_rnd branch, without any test failures. So the ciphers actually work well either way, which also explains no test failures anywhere since initial JDK-8337632 push.

      The only observable effect is when the key array is the edge of committed heap, then we can SEGV accessing it. And that is rather rare, which explains why we do not see it even with aggressive GC testing. I believe the fix is "just" moving the load below to AES192/256 block; this would also match what AVX2 version of intrinsic does. I added the runtime bounds check that reproduces the issue cleanly, and there are no other out-of-bounds problems accessing key data, except for this single one.

            shade Aleksey Shipilev
            shade Aleksey Shipilev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: