Several debug assertion failures have been observed on RISC-V, on physical boards only.
Failure list: (the `hs_err` log is in the JBS issue)
```
compiler/vectorapi/TestVectorShiftImm.java
compiler/compilercontrol/jcmd/AddPrintAssemblyTest.java
compiler/intrinsics/math/TestFpMinMaxIntrinsics.java
compiler/compilercontrol/TestCompilerDirectivesCompatibilityFlag.java
compiler/compilercontrol/TestCompilerDirectivesCompatibilityCommandOn.java
compiler/runtime/TestConstantsInError.java
compiler/compilercontrol/jcmd/PrintDirectivesTest.java
```
When the failure occurs, hsdis is disassembling the last unrecognizable data at the end of a code blob, usually the data stored in trampolines. It could be theoretically any address inside the code cache, and sometimes binutils can recognize the data as 2-byte instructions, 4-byte instructions, and 6 or 8-byte instructions even though as far as I know no instructions longer than 4-byte have landed. Therefore, binutils may firstly run out of bound after the calculation. However, the RISC-V binutils returns a `EIO` [1] (which is the `status`, always a `5`, FYI), rather than returning a `-1` (FYI, [2][3][4][5]) on other platforms when such out-of-bound happens. So when coming back to our hsdis, we (hsdis) get the `size = 5` as the return value [6] rather than `-1`: our hsdis error handling is skipped, our variable `p` is out of bound, and then we meet the crash.
To fix it, we should check the value is the special `EIO` on RISC-V. However, after fixing that issue, I found binutils would print some messages like "Address 0x%s is out of bounds." on the screen:
```
0x0000003f901a41b4: auipc t0,0x0 ; {trampoline_stub}
0x0000003f901a41b8: ld t0,12(t0) # 0x0000003f901a41c0
0x0000003f901a41bc: jr t0
0x0000003f901a41c0: .2byte 0x8ec0
0x0000003f901a41c2: srli s0,s0,0x21
0x0000003f901a41c4: Address 0x0000003f901a41c9 is out of bounds. <----------- But we want the real bytes here.
```
So, we should overwrite the `disassemble_info.memory_error_func` in the binutils callback [7], to generate our own output:
```
0x0000003f901a41b4: auipc t0,0x0 ; {trampoline_stub}
0x0000003f901a41b8: ld t0,12(t0) # 0x0000003f901a41c0
0x0000003f901a41bc: jr t0
0x0000003f901a41c0: .2byte 0x8ec0
0x0000003f901a41c2: srli s0,s0,0x21
0x0000003f901a41c4: .4byte 0x0000003f
```
Mirroring the code of hsdis-llvm, to print merely a 4-byte data [8].
BTW, the reason why the crash only happens on the physical board, is that boards support RISC-V sv39 address mode only: a legal user-space address can be no more than 38-bit. So the code cache is always mmapped to an address like `0x3fe0000000`. Such a `0x3f` is always recognized as the mark of an 8-byte instruction [9].
[1] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/riscv-dis.c#L940
[2] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/aarch64-dis.c#L3792
[3] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/ppc-dis.c#L872
[4] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/s390-dis.c#L305
[5] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/i386-dis.c#L9466 (the i386 one uses a `setlongjmp` to handle the exception case, so the code might look different)
[6] https://github.com/openjdk/jdk/blob/94e7cc8587356988e713d23d1653bdd5c43fb3f1/src/utils/hsdis/binutils/hsdis-binutils.c#L198
[7] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/dis-buf.c#L51-L72
[8] https://github.com/openjdk/jdk/blob/94e7cc8587356988e713d23d1653bdd5c43fb3f1/src/utils/hsdis/llvm/hsdis-llvm.cpp#L316-L317
[9] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/include/opcode/riscv.h#L30-L42
Failure list: (the `hs_err` log is in the JBS issue)
```
compiler/vectorapi/TestVectorShiftImm.java
compiler/compilercontrol/jcmd/AddPrintAssemblyTest.java
compiler/intrinsics/math/TestFpMinMaxIntrinsics.java
compiler/compilercontrol/TestCompilerDirectivesCompatibilityFlag.java
compiler/compilercontrol/TestCompilerDirectivesCompatibilityCommandOn.java
compiler/runtime/TestConstantsInError.java
compiler/compilercontrol/jcmd/PrintDirectivesTest.java
```
When the failure occurs, hsdis is disassembling the last unrecognizable data at the end of a code blob, usually the data stored in trampolines. It could be theoretically any address inside the code cache, and sometimes binutils can recognize the data as 2-byte instructions, 4-byte instructions, and 6 or 8-byte instructions even though as far as I know no instructions longer than 4-byte have landed. Therefore, binutils may firstly run out of bound after the calculation. However, the RISC-V binutils returns a `EIO` [1] (which is the `status`, always a `5`, FYI), rather than returning a `-1` (FYI, [2][3][4][5]) on other platforms when such out-of-bound happens. So when coming back to our hsdis, we (hsdis) get the `size = 5` as the return value [6] rather than `-1`: our hsdis error handling is skipped, our variable `p` is out of bound, and then we meet the crash.
To fix it, we should check the value is the special `EIO` on RISC-V. However, after fixing that issue, I found binutils would print some messages like "Address 0x%s is out of bounds." on the screen:
```
0x0000003f901a41b4: auipc t0,0x0 ; {trampoline_stub}
0x0000003f901a41b8: ld t0,12(t0) # 0x0000003f901a41c0
0x0000003f901a41bc: jr t0
0x0000003f901a41c0: .2byte 0x8ec0
0x0000003f901a41c2: srli s0,s0,0x21
0x0000003f901a41c4: Address 0x0000003f901a41c9 is out of bounds. <----------- But we want the real bytes here.
```
So, we should overwrite the `disassemble_info.memory_error_func` in the binutils callback [7], to generate our own output:
```
0x0000003f901a41b4: auipc t0,0x0 ; {trampoline_stub}
0x0000003f901a41b8: ld t0,12(t0) # 0x0000003f901a41c0
0x0000003f901a41bc: jr t0
0x0000003f901a41c0: .2byte 0x8ec0
0x0000003f901a41c2: srli s0,s0,0x21
0x0000003f901a41c4: .4byte 0x0000003f
```
Mirroring the code of hsdis-llvm, to print merely a 4-byte data [8].
BTW, the reason why the crash only happens on the physical board, is that boards support RISC-V sv39 address mode only: a legal user-space address can be no more than 38-bit. So the code cache is always mmapped to an address like `0x3fe0000000`. Such a `0x3f` is always recognized as the mark of an 8-byte instruction [9].
[1] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/riscv-dis.c#L940
[2] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/aarch64-dis.c#L3792
[3] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/ppc-dis.c#L872
[4] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/s390-dis.c#L305
[5] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/i386-dis.c#L9466 (the i386 one uses a `setlongjmp` to handle the exception case, so the code might look different)
[6] https://github.com/openjdk/jdk/blob/94e7cc8587356988e713d23d1653bdd5c43fb3f1/src/utils/hsdis/binutils/hsdis-binutils.c#L198
[7] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/opcodes/dis-buf.c#L51-L72
[8] https://github.com/openjdk/jdk/blob/94e7cc8587356988e713d23d1653bdd5c43fb3f1/src/utils/hsdis/llvm/hsdis-llvm.cpp#L316-L317
[9] https://github.com/bminor/binutils-gdb/blob/binutils-2_38-branch/include/opcode/riscv.h#L30-L42
- links to
-
Review openjdk/jdk/12551