Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8143321

Reduce the C2 compiler's memory usage

    XMLWordPrintable

Details

    • 8

    Description

      The C2 compiler's memory usage increased significantly starting with JDK8. The increase memory usage is most noticeable when executing JavaScript applications on top of the VM. For example, for an application that consists of a set of JavaScript scripts (the reproducer attached to JDK-8129847), the VM's and the application's performance is described by the following numbers:

      JDK version | LiveNodeCountInliningCutoff | RSS (MB) | Total runtime | Compilation time | Application time
      ================================================================
      7u80 | 20'000 (default) | 163 | 60s | 11s | 49s
      8u60 | 20'000 | 522 | 166s | 127s | 39s
      8u60 | 40'000 (default) | 976 | 414ss | 371s | 43s


      The measurement was executed on a Linux x86_64 machine with -Xbatch and a maximum heap size of 100 MB. The LiveNodeCountInliningCutoff column illustrates the value of the flag with the same name. As suggested by the numbers, the VM's memory usage increases by around 3.2X from 7u80 to 8u60. The most likely reason for the increase is that the Nashorn JavaScript engine is used by default in 8u60. The VM's memory usage increases further (by around 1.9X) when the 8u60 VM is executed with the default value for the LiveNodeCountInliningCutoff flag. The flag's value has been increased from 20'000 to 40'000 by JDK-8058148. An other likely reason for the increased memory usage is the change of the MaxNodeLimit flag's default value by JDK-8014959 and JDK-8058148).

      In total, the VM's memory usage for the application considered increases by 6X from 7u80 to 8u60. JDK9 is similar to JDK8 and is also affected by this problem.

      A number of issues have targeted reducing the VM's memory usage (JDK-8011858, JDK-8137160, JDK-8129847). The patches for the first two bugs result in a slight reduction of memory usage, the patch for JDK-8129847 reduces memory usage by 20-30%. However, the VM's memory usage should be further reduced.

      The goal of this enhancement is to further reduce the memory usage of the compiler. This issue is supposed to investigate three ways the compiler's memory usage can be reduced.

      (1) Change arrays directly addressed with node IDs (the _idx field of every compiler node) to use hash tables instead. This change should target arrays with a high impact on the compiler's memory usage.

      (2) For compilations with a large number of nodes, introduce and additional chunk size (in addition to the existing sizes tiny, init, medium, size, non_pool_size). The new chunk size should be larger than the existing chunk sizes and should allow the reuse of large memory chunks that are currently allocated with the operating system's memory allocator.

      (3) Incremental (or post-parse) inlining in C2 produces lots of dead nodes (observed on Octane/Nashorn). Multiple PhaseRenumberLive passes during incremental inlining can help further reduce peak memory usage in that scenario. Since the pass can be expensive, it can be triggered when the gap between unique and live node counts becomes too large and performed with PhaseIdealLoop (see Compile::inline_incrementally).
       
      (4) PhaseRemoveUseless and PhaseIterGVN are performed too frequently (that problem is targeted by JDK-8059241).

      Here are some notes related to (1):

      Code locations that use directly-referenced arrays:
      - PhaseIdealLoop::Dominators -- allocates dfsorder and ntarjan arrays of size unique();
      - PhaseIdealLoop::dom_depth and PhaseIdealLoop::_idom -- proportional to unique();
      - PhaseCFG::global_code_motion -- recalc_pressure_nodes -- could be large, but size not necessarily proportional to unique();
      - PhaseChaitin::stretch_base_pointer_live_ranges -- derived_base_map is allocated with malloc, size proportional to unique();
      - PhaseIdealLoop::_preorders -- size proportional to unique();
      - Compile::_node_bundling_base
      - PhaseRegAlloc::_node_regs -- size proportional to unique();
      - Scheduling::_node_bundling_base, _node_latency, _uses, _current_latency -- size most likely proportional to unique();
      - Compile::fill_buffer -- allocates node_offsets array of size unique(), used only in fastdebug.

      Data structures that use directly-referenced arrays:
      - GrowableArray -- example usages ConnectionGraph::nodes, DepGraph::_map, Compile::_node_note_array, LiveRangeMap::_names, LiveRangeMap::_uf_map, PhaseCFG::_node_latency
      - Node_Array -- example usages ConnectionGraph::_node_map, Matcher::_old2new_map (only debug), Matcher::_new2old_map (only debug), PhaseTransform::_nodes, Type_Array::_types
      - Node_List -- example usages: Invariance::_old_new, PhaseCFG::schedule_local, Scheduling::_scheduled, Scheduling::_available
      - Block_Array -- used in PhaseCFG::_node_to_block_mapping
      - VectorSet -- uses _idx for checks -- already compressed but it could be maybe further optimized.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zmajo Zoltan Majo (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: