Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8281181

Do not use CPU Shares to compute active processor count





        This change has compatibility impacts. See the discussion in the CSR JDK-8281571 for details.

        Container runtimes support the concept of "CPU shares" to divide available CPU resources among competing containers. This bug description uses Docker as an example, but the bug affects other runtimes as well.

        Docker has a "--cpu-shares" option [3] which controls the pseudo file cpu.shares [1] with cgroupv1 and cpu.weight [2] with cgroupv2.

        Excerpt from [1] "cpu.shares: The weight of each group living in the same hierarchy, that translates into the amount of CPU it is expected to get. Upon cgroup creation, each group gets assigned a default of 1024. The percentage of CPU assigned to the cgroup is the value of shares divided by the sum of all shares in all cgroups in the same level."

        From the above excerpt, it's clear that cpu.shares should be interpreted as relative values. For example, if we have processes A and B that are both actively executing and are assigned these cpu.shares:

            A = 100, B = 100, or
            A = 1000, B = 1000

        Then A and B will both get half of the available CPU resources, because they have the same cpu.shares value. The exact numerical value of cpu.shares doesn't matter.

        Also, if process B is idle, then process A will get all available CPUs, regardless of the cpu.shares value.

        However, since JDK-8146115, the JDK interprets cpu.shares as an absolute number that limits how many CPUs the current process can use [4, 5, 6]:

        0 ... 1023 = 1 CPU
        1024 = (no limit)
        2048 = 2 CPUs
        4096 = 4 CPUs

        This incorrect interpretation can cause CPU underutilization:

        (a) on machines with lots of physical CPUs -- see attachments cpu-shares-bug.sh, cpu-shares-bug.log.txt, and the first comment below.

        (b) if small values are chosen for the CPU shares (see JDK-8279484, where cpu.weight is set to 1 by Kubernetes).

        (c) if all other containers are idle but the actively executing container is artificially constrained.

        Also, the somewhat arbitrary interpretation of "1024 means no limit" can lead to unexpected behaviors. E.g., if A is set to 1024 and B is set to 2048, and the programs are running on a 16 core machine, A will end up using most of the CPUs, contrary to the user's expectation.

        P.S., A good write up of the same problem facing OpenJ9 can be found at [7]


        [1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/glommer/memcg/+/cpu_stat/Documentation/cgroups/cpu.txt

        [2] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

        [3] https://docs.docker.com/config/containers/resource_constraints/

        [4] https://github.com/iklam/jdk/blame/ec63957f9d103e86d3b8e235e79cabb8992cb3ca/test/hotspot/jtreg/containers/docker/TestCPUAwareness.java#L62

        [5] https://github.com/iklam/jdk/blame/d4546b6b36f9dc9ff3d626f8cfe62b62daa0de01/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp#L236

        [6] https://github.com/iklam/jdk/blame/f54ce84474c2ced340c92564814fa5c221415944/src/hotspot/os/linux/cgroupSubsystem_linux.cpp#L505

        [7] https://github.com/eclipse-openj9/openj9/issues/2251


          Issue Links



                iklam Ioi Lam
                iklam Ioi Lam
                0 Vote for this issue
                17 Start watching this issue