-
Bug
-
Resolution: Unresolved
-
P4
-
23
-
aarch64
Found while working on JDK-8331185.
We use an abnormally high amount of memory (~1.03 GB) on aarch64 when compiling `compiler/c2/TestFindNode::test(()V)` with a lot of stress options:
Compiler memory statistic (-XX:CompileCommand:memstat,*.*,print) shows:
```
total NA RA result #nodes limit time type #rc thread method
1158047584 36241176 1108022744 err 66123 - 22.191 c2 2 0x0000007fb431ef50 compiler/c2/TestFindNode::test(()V)
669680 230080 208536 ok 224 - 1.191 c2 2 0x0000007fb431ef50 jdk/internal/util/ArraysSupport::signedHashCode((I[BII)I)
669680 230080 241264 ok 274 - 1.697 c2 2 0x0000007fb431ef50 java/lang/StringCoding::countPositives(([BII)I)
375128 99168 143080 ok 78 - 2.100 c2 1 0x0000007fb431ef50 java/lang/String::charAt((I)C)
```
Allocation happens here:
```
V [libjvm.so+0x62e004] Arena::Amalloc(unsigned long, AllocFailStrategy::AllocFailEnum)+0xb8 (arena.hpp:142)
V [libjvm.so+0x135dce0] ResourceArea::allocate_bytes(unsigned long, AllocFailStrategy::AllocFailEnum)+0x2c (resourceArea.inline.hpp:35)
V [libjvm.so+0x135a238] PhaseChaitin::Split(unsigned int, ResourceArea*)+0x358 (reg_split.cpp:555)
V [libjvm.so+0x8344f4] PhaseChaitin::Register_Allocate()+0x660 (chaitin.cpp:553)
V [libjvm.so+0x931a24] Compile::Code_Gen()+0x238 (compile.cpp:2988)
```
https://github.com/openjdk/jdk/blob/727293906430cfd95c0e2b2bd7a9df658f6fe94d/src/hotspot/share/opto/reg_split.cpp#L555C1-L556C60
where we allocate arrays in the split arena, which is a resource area, in a loop, depending on _cfg.number_of_blocks().
On aarch64, `_cfg.number_of_blocks()` is *5984*.
As a comparison, x64 takes 178 MB to compile that test method, and `_cfg.number_of_blocks()` is *52*:
```
total NA RA result #nodes limit time type #rc thread method
186990336 26844800 146984024 ok 43763 1024M 2.806 c2 2 0x00007f7df028ef60 compiler/c2/TestFindNode::test(()V)
```
-------------------------------------
Reproduced on a Raspberry Pi 4 (but also reproducible on MacOS m1) with:
```
export ADD_OPTIONS='-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:CompileCommand=memlimit,*.*,1G~crash -XX:CompileCommand=memstat,*::*,print'
jtreg "-vmoptions:${ADD_OPTIONS}" ... /shared/projects/openjdk/jdk-jdk/source/test/hotspot/jtreg/compiler/c2/TestFindNode.java
```
We use an abnormally high amount of memory (~1.03 GB) on aarch64 when compiling `compiler/c2/TestFindNode::test(()V)` with a lot of stress options:
Compiler memory statistic (-XX:CompileCommand:memstat,*.*,print) shows:
```
total NA RA result #nodes limit time type #rc thread method
1158047584 36241176 1108022744 err 66123 - 22.191 c2 2 0x0000007fb431ef50 compiler/c2/TestFindNode::test(()V)
669680 230080 208536 ok 224 - 1.191 c2 2 0x0000007fb431ef50 jdk/internal/util/ArraysSupport::signedHashCode((I[BII)I)
669680 230080 241264 ok 274 - 1.697 c2 2 0x0000007fb431ef50 java/lang/StringCoding::countPositives(([BII)I)
375128 99168 143080 ok 78 - 2.100 c2 1 0x0000007fb431ef50 java/lang/String::charAt((I)C)
```
Allocation happens here:
```
V [libjvm.so+0x62e004] Arena::Amalloc(unsigned long, AllocFailStrategy::AllocFailEnum)+0xb8 (arena.hpp:142)
V [libjvm.so+0x135dce0] ResourceArea::allocate_bytes(unsigned long, AllocFailStrategy::AllocFailEnum)+0x2c (resourceArea.inline.hpp:35)
V [libjvm.so+0x135a238] PhaseChaitin::Split(unsigned int, ResourceArea*)+0x358 (reg_split.cpp:555)
V [libjvm.so+0x8344f4] PhaseChaitin::Register_Allocate()+0x660 (chaitin.cpp:553)
V [libjvm.so+0x931a24] Compile::Code_Gen()+0x238 (compile.cpp:2988)
```
https://github.com/openjdk/jdk/blob/727293906430cfd95c0e2b2bd7a9df658f6fe94d/src/hotspot/share/opto/reg_split.cpp#L555C1-L556C60
where we allocate arrays in the split arena, which is a resource area, in a loop, depending on _cfg.number_of_blocks().
On aarch64, `_cfg.number_of_blocks()` is *5984*.
As a comparison, x64 takes 178 MB to compile that test method, and `_cfg.number_of_blocks()` is *52*:
```
total NA RA result #nodes limit time type #rc thread method
186990336 26844800 146984024 ok 43763 1024M 2.806 c2 2 0x00007f7df028ef60 compiler/c2/TestFindNode::test(()V)
```
-------------------------------------
Reproduced on a Raspberry Pi 4 (but also reproducible on MacOS m1) with:
```
export ADD_OPTIONS='-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:CompileCommand=memlimit,*.*,1G~crash -XX:CompileCommand=memstat,*::*,print'
jtreg "-vmoptions:${ADD_OPTIONS}" ... /shared/projects/openjdk/jdk-jdk/source/test/hotspot/jtreg/compiler/c2/TestFindNode.java
```
- relates to
-
JDK-8331185 Enable compiler memory limits in debug builds
-
- Resolved
-
-
JDK-8331295 C2: Do not clone address computations that are indirect memory input to at least one load/store
-
- Closed
-