Loading...

Type: Enhancement
Resolution: Won't Fix
Priority: P3
Fix Version/s: 11
Affects Version/s: 8, 9, 10
Component/s: hotspot
Labels:

Subcomponent:
compiler
Introduced In Version:

8

The C2 compiler's memory usage increased significantly starting with JDK8. The increase memory usage is most noticeable when executing JavaScript applications on top of the VM. For example, for an application that consists of a set of JavaScript scripts (the reproducer attached to ~~JDK-8129847~~), the VM's and the application's performance is described by the following numbers:

JDK version | LiveNodeCountInliningCutoff | RSS (MB) | Total runtime | Compilation time | Application time
================================================================
7u80 | 20'000 (default) | 163 | 60s | 11s | 49s
8u60 | 20'000 | 522 | 166s | 127s | 39s
8u60 | 40'000 (default) | 976 | 414ss | 371s | 43s

The measurement was executed on a Linux x86_64 machine with -Xbatch and a maximum heap size of 100 MB. The LiveNodeCountInliningCutoff column illustrates the value of the flag with the same name. As suggested by the numbers, the VM's memory usage increases by around 3.2X from 7u80 to 8u60. The most likely reason for the increase is that the Nashorn JavaScript engine is used by default in 8u60. The VM's memory usage increases further (by around 1.9X) when the 8u60 VM is executed with the default value for the LiveNodeCountInliningCutoff flag. The flag's value has been increased from 20'000 to 40'000 by ~~JDK-8058148~~. An other likely reason for the increased memory usage is the change of the MaxNodeLimit flag's default value by ~~JDK-8014959~~ and ~~JDK-8058148~~).

In total, the VM's memory usage for the application considered increases by 6X from 7u80 to 8u60. JDK9 is similar to JDK8 and is also affected by this problem.

A number of issues have targeted reducing the VM's memory usage (~~JDK-8011858~~, ~~JDK-8137160~~, ~~JDK-8129847~~). The patches for the first two bugs result in a slight reduction of memory usage, the patch for ~~JDK-8129847~~ reduces memory usage by 20-30%. However, the VM's memory usage should be further reduced.

The goal of this enhancement is to further reduce the memory usage of the compiler. This issue is supposed to investigate three ways the compiler's memory usage can be reduced.

(1) Change arrays directly addressed with node IDs (the _idx field of every compiler node) to use hash tables instead. This change should target arrays with a high impact on the compiler's memory usage.

(2) For compilations with a large number of nodes, introduce and additional chunk size (in addition to the existing sizes tiny, init, medium, size, non_pool_size). The new chunk size should be larger than the existing chunk sizes and should allow the reuse of large memory chunks that are currently allocated with the operating system's memory allocator.

(3) Incremental (or post-parse) inlining in C2 produces lots of dead nodes (observed on Octane/Nashorn). Multiple PhaseRenumberLive passes during incremental inlining can help further reduce peak memory usage in that scenario. Since the pass can be expensive, it can be triggered when the gap between unique and live node counts becomes too large and performed with PhaseIdealLoop (see Compile::inline_incrementally).

(4) PhaseRemoveUseless and PhaseIterGVN are performed too frequently (that problem is targeted by ~~JDK-8059241~~).

Here are some notes related to (1):

Code locations that use directly-referenced arrays:
- PhaseIdealLoop::Dominators -- allocates dfsorder and ntarjan arrays of size unique();
- PhaseIdealLoop::dom_depth and PhaseIdealLoop::_idom -- proportional to unique();
- PhaseCFG::global_code_motion -- recalc_pressure_nodes -- could be large, but size not necessarily proportional to unique();
- PhaseChaitin::stretch_base_pointer_live_ranges -- derived_base_map is allocated with malloc, size proportional to unique();
- PhaseIdealLoop::_preorders -- size proportional to unique();
- Compile::_node_bundling_base
- PhaseRegAlloc::_node_regs -- size proportional to unique();
- Scheduling::_node_bundling_base, _node_latency, _uses, _current_latency -- size most likely proportional to unique();
- Compile::fill_buffer -- allocates node_offsets array of size unique(), used only in fastdebug.

Data structures that use directly-referenced arrays:
- GrowableArray -- example usages ConnectionGraph::nodes, DepGraph::_map, Compile::_node_note_array, LiveRangeMap::_names, LiveRangeMap::_uf_map, PhaseCFG::_node_latency
- Node_Array -- example usages ConnectionGraph::_node_map, Matcher::_old2new_map (only debug), Matcher::_new2old_map (only debug), PhaseTransform::_nodes, Type_Array::_types
- Node_List -- example usages: Invariance::_old_new, PhaseCFG::schedule_local, Scheduling::_scheduled, Scheduling::_available
- Block_Array -- used in PhaseCFG::_node_to_block_mapping
- VectorSet -- uses _idx for checks -- already compressed but it could be maybe further optimized.

relates to

JDK-8014959 assert(Compile::current()->live_nodes() < (uint)MaxNodeLimit) failed: Live Node limit exceeded limit

Closed

JDK-8163999 Workaround intermittent failures of TreePosTest.java due to C2 memory usage

Closed

JDK-8059241 C2: Excessive RemoveUseless passes during incremental inlining

Resolved

JDK-8129847 Compiling methods generated by Nashorn triggers high memory usage in C2

Resolved

JDK-8058148 MaxNodeLimit and LiveNodeCountInliningCutoff should be increased

Closed

JDK-8165193 Workaround intermittent failures of JavacTreeScannerTest and SourceTreeScannerTest due to C2 memory usage

Closed

JDK-8011858 Use Compile::live_nodes() instead of Compile::unique() in appropriate places

Resolved

JDK-8137160 Use Compile::live_nodes instead of Compile::unique() in appropriate places -- followup

Resolved

(3 relates to)

Details

Description

Attachments

Issue Links

Activity

People

Dates