Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8286220

Do not use CPU Shares to compute active processor count

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P3 P3
    • 11-pool
    • hotspot
    • None
    • behavioral
    • minimal
    • Please see discussion in "Description" section.
    • add/remove/modify command line option, Other
    • Implementation

      Note: This backport CSR is copied verbatim from the original CSR JDK-8281571, with the exception that the affected VM flags are not deprecated.

      Summary

      Modify HotSpot's Linux-only container detection code to not use CPU Shares (the "cpu.shares" file with cgroupv1 or "cpu.weight" file with cgroupv2, exposed through the CgroupSubsystem::cpu_shares() API) to limit the number of active processors that can be used by the JVM. Add a new flag , UseContainerCpuShares, to restore the old behaviour.

      Problem

      Since JDK-8146115, if the JVM is executed inside a container, it tries to respect the container's CPU resource limits. For example, even if /proc/cpuinfo states the machine has 32 cpus, os::active_processor_count() may return a smaller value because CgroupSubsystem::active_processor_count() returns a CPU limit as configured by the cgroup pseudo file-system. As a result, thread pools in the JVM such as the garbage collector worker threads or the ForkJoin common pool are given a smaller size.

      However, the current implementation of CgroupSubsystem::active_processor_count() also uses CgroupSubsystem::cpu_shares() to compute an upper limit for os::active_processor_count(). This is incorrect because:

      • In general, the amount of CPU resources given to a process running within a container is based on the ratio between (a) the CPU Shares of the current container, and (b) the total CPU Shares of all active containers running via the container engine on a host. Thus, the JVM process cannot know how much CPU it will be given by only looking at its own process's CPU Shares value.

      • The ratio between (a) and (b) varies over time, depending on how many other processes within containers are active during each scheduling period. A one-shot static value computed at JVM start-up cannot capture this dynamic behavior.

      • JDK-8216366 documents why the 1024 hard-coded constant is being used within the JVM. The referenced review thread uses Kubernetes as (one) justification for using CPU Shares as an upper bound for CPU resources. Yet, Kubernetes uses CPU Shares to implement its "CPU request" mechanism. It refuses to schedule a container on a node which would exceed the node's total CPU Shares capacity (number_of_cores * 1024). Hence, Kubernetes' notion of "CPU request" is a lower bound -- the container running the JVM process would be given at least the amount of CPU requested, potentially more. The JVM using CPU Shares as an upper bound is in conflict with how Kubernetes actually behaves.

      The JVM's use of CPU Shares has lead to CPU underutilization (e.g., JDK-8279484).

      Also, special-casing of the 1024 constant results in unintuitive behavior. For example, when running on a cgroupv1 system:

      • docker run ... --cpu-shares=512 java .... ==> os::active_processor_count() = 1
      • docker run ... --cpu-shares=1024 java ... ==> os::active_processor_count() = 32 (total CPUs on this system)
      • docker run ... --cpu-shares=2048 java ... ==> os::active_processor_count() = 2

      When the --cpu-shares option is being set to 1024, the JVM cannot decide whether 1024 means "at least one CPU" (Kubernetes' interpretation) or "--cpu-shares is unset" (Docker's interpretation -- docker sets CPU shares to 1024 if the --cpu-shares flag is not specified on the command-line).

      Solution

      Out of the box, the JVM will not use CPU Shares in the computation of os::active_processor_count().

      As describe above, the JVM cannot make any reasonable decision just by looking at the value of CPU Shares alone. We should leave the CPU scheduling decisions to the OS.

      Add a new flag, UseContainerCpuShares, to restore the old behaviour.

      Specification

      • Add a new flag UseContainerCpuShares
      • Update the meaning of the existing flag PreferContainerQuotaForCPUCount

      Changes in os/linux/globals_linux.hpp:

      +  product(bool, UseContainerCpuShares, false,                           \
      +          "Include CPU shares in the CPU availability calculation."       \

      Compatibility Risks

      Kubernetes:

      • Kubernetes requires that if either "CPU requests" or "CPU limits" are set, then both must be set. As a result, because of JDK-8197589, by default the JVM will ignore CPU Shares. The JVM already does what this CSR specifies.
      • If neither "CPU requests" or "CPU limits" are set, Kubernetes runs the container with no (upper) CPU limit, and minimal CPU Shares. Before this CSR, the JVM will limit itself to a single CPU. After this CSR, the JVM may use as much CPU as given by the OS (subject to competition with other active processes within containers). If this new behavior is not what the user wants, they should explicitly use "CPU requests"/"CPU limits" of their Kubernetes deployments instead of relying on the previous JVM behavior.

      Other Linux-based container orchestration environments

      • In general, after this CSR, out of the box, a JVM process will be able to use more CPU resources than given by the OS scheduler. If the user wants to limit the active processor count of the JVM process within a container they should use the appropriate mechanisms of the container orchestration environments to set the desired limits. For example, use a limit based on CPU quotas or CPU sets. Another option is to override the default container detection mechanism by explicitly specifying -XX:ActiveProcessorCount=<n> on the command-line.

      As a stop-gap measure, if the user cannot immediately modify their configuration settings per the above suggestions, they can use the flag -XX:+UseContainerCpuShares to bring back the behavior before this CSR. Note that this flag is intended to be used only for short term transition purposes and will be obsoleted in JDK 20.

            iklam Ioi Lam
            iklam Ioi Lam
            Harold Seigel (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: