Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8281571

Do not use CPU Shares to compute active processor count

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P3 P3
    • 19
    • hotspot
    • None
    • behavioral
    • minimal
    • Please see discussion in "Description" section.
    • add/remove/modify command line option, Other
    • Implementation

      Summary

      Modify HotSpot's Linux-only container detection code to not use CPU Shares (the "cpu.shares" file with cgroupv1 or "cpu.weight" file with cgroupv2, exposed through the CgroupSubsystem::cpu_shares() API) to limit the number of active processors that can be used by the JVM. Add a new flag (immediately deprecated), UseContainerCpuShares, to restore the old behaviour; and deprecate the existing PreferContainerQuotaForCPUCount flag.

      Problem

      Since JDK-8146115, if the JVM is executed inside a container, it tries to respect the container's CPU resource limits. For example, even if /proc/cpuinfo states the machine has 32 cpus, os::active_processor_count() may return a smaller value because CgroupSubsystem::active_processor_count() returns a CPU limit as configured by the cgroup pseudo file-system. As a result, thread pools in the JVM such as the garbage collector worker threads or the ForkJoin common pool are given a smaller size.

      However, the current implementation of CgroupSubsystem::active_processor_count() also uses CgroupSubsystem::cpu_shares() to compute an upper limit for os::active_processor_count(). This is incorrect because:

      • In general, the amount of CPU resources given to a process running within a container is based on the ratio between (a) the CPU Shares of the current container, and (b) the total CPU Shares of all active containers running via the container engine on a host. Thus, the JVM process cannot know how much CPU it will be given by only looking at its own process's CPU Shares value.

      • The ratio between (a) and (b) varies over time, depending on how many other processes within containers are active during each scheduling period. A one-shot static value computed at JVM start-up cannot capture this dynamic behavior.

      • JDK-8216366 documents why the 1024 hard-coded constant is being used within the JVM. The referenced review thread uses Kubernetes as (one) justification for using CPU Shares as an upper bound for CPU resources. Yet, Kubernetes uses CPU Shares to implement its "CPU request" mechanism. It refuses to schedule a container on a node which would exceed the node's total CPU Shares capacity (number_of_cores * 1024). Hence, Kubernetes' notion of "CPU request" is a lower bound -- the container running the JVM process would be given at least the amount of CPU requested, potentially more. The JVM using CPU Shares as an upper bound is in conflict with how Kubernetes actually behaves.

      The JVM's use of CPU Shares has lead to CPU underutilization (e.g., JDK-8279484).

      Also, special-casing of the 1024 constant results in unintuitive behavior. For example, when running on a cgroupv1 system:

      • docker run ... --cpu-shares=512 java .... ==> os::active_processor_count() = 1
      • docker run ... --cpu-shares=1024 java ... ==> os::active_processor_count() = 32 (total CPUs on this system)
      • docker run ... --cpu-shares=2048 java ... ==> os::active_processor_count() = 2

      When the --cpu-shares option is being set to 1024, the JVM cannot decide whether 1024 means "at least one CPU" (Kubernetes' interpretation) or "--cpu-shares is unset" (Docker's interpretation -- docker sets CPU shares to 1024 if the --cpu-shares flag is not specified on the command-line).

      Solution

      Out of the box, the JVM will not use CPU Shares in the computation of os::active_processor_count().

      As describe above, the JVM cannot make any reasonable decision just by looking at the value of CPU Shares alone. We should leave the CPU scheduling decisions to the OS.

      Add a new flag (immediately deprecated), UseContainerCpuShares, to restore the old behaviour; and deprecate the existing PreferContainerQuotaForCPUCount flag.

      Specification

      • Add a new flag UseContainerCpuShares
      • Update the meaning of the existing flag PreferContainerQuotaForCPUCount
      • Both flags are marked as deprecated in JDK 19, to be obsoleted in JDK 20 and expired in JDK 21. Uses of these flags are discouraged.

      Changes in os/linux/globals_linux.hpp:

      +  product(bool, UseContainerCpuShares, false,                           \
      +          "(Deprecated) Include CPU shares in the CPU availability calculation."       \
      +                                                                        \
      +  product(bool, PreferContainerQuotaForCPUCount, true,                  \
      +          "(Deprecated) Calculate the container CPU availability based on the value" \
      +          " of quotas \(if set\), when true. Otherwise, if "            \
      +          " UseContainerCpuShares is true, use the CPU"                 \
      +          " shares value, provided it is less than quota.")             \
      +

      Compatibility Risks

      Kubernetes (see "Kubernetes notes #1" in comments section):

      • Kubernetes has configuration options for "CPU request" (which corresponds to cgroups CPU Shares) and "CPU limit" (which corresponds to cgroups CPU Quota). If "CPU limit" is not set, Kubernetes runs the container with no (upper) CPU limit. Before this CSR, the JVM may limit itself to a subset of CPUs (depending on the settings of "CPU request"). After this CSR, the JVM may use as much CPU as given by the OS (subject to competition with other active processes on the same host). In some cases, this may cause overconsumption of CPU or memory resources. To fix such problems, the user should explicitly set the "CPU request"/"CPU limit" of their Kubernetes deployments instead of relying on the previous JVM behavior.

      Other Linux-based container orchestration environments

      • In general, after this CSR, out of the box, a JVM process will be able to use more CPU resources than given by the OS scheduler. If the user wants to limit the active processor count of the JVM process within a container they should use the appropriate mechanisms of the container orchestration environments to set the desired limits. For example, use a limit based on CPU quotas or CPU sets. Another option is to override the default container detection mechanism by explicitly specifying -XX:ActiveProcessorCount=<n> on the command-line.

      As a stop-gap measure, if the user cannot immediately modify their configuration settings per the above suggestions, they can use the flag -XX:+UseContainerCpuShares to bring back the behavior before this CSR. Note that this flag is intended to be used only for short term transition purposes and will be obsoleted in JDK 20.

            iklam Ioi Lam
            iklam Ioi Lam
            David Holmes, Severin Gehwolf
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: