-
Bug
-
Resolution: Unresolved
-
P3
-
17.0.13, 21.0.5, 23.0.1, 24
-
In Review
-
generic
-
linux
The following issue was found in Linux cgroup subsystem implementation. The Cgroup V1 subsustem fails to initialize mounted controllers properly in certain cases, that may lead to controllers left undetected/inactive. We observed the behavior in CloudFoundry deployments, it affects also host systems.
In cases where the JVM isn't PID 1, for example started from a shell - and the shell process has been moved from one cgroup path to another - then the JVM might set the subsystem path to null (on cg v1).
[0.001s][trace][os,container] OSContainer::init: Initializing Container Support
[0.001s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.002s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.002s][trace][os,container] Adjusting controller path for memory: (null)
[0.002s][debug][os,container] read_string: subsystem path is null
[0.002s][trace][os,container] Memory Limit failed: -2
[0.002s][debug][os,container] read_string: subsystem path is null
[0.002s][trace][os,container] Memory Limit failed: -2
[0.002s][trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup/memory, adjusting to original path /test
[0.002s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.003s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
[0.003s][trace][os,container] CPU Quota is: -1
[0.003s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us
[0.003s][trace][os,container] CPU Period is: 100000
[0.003s][trace][os,container] OSContainer::active_processor_count: 12
[0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.003s][trace][os,container] total physical memory: 67163226112
[0.003s][debug][os,container] read_string: subsystem path is null
[0.003s][trace][os,container] Memory Limit failed: -2
[0.005s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.021s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
openjdk 24-internal 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
On the Java Metrics side this would be observable by a NPE for example when the application code uses some MXBean code.
This test code:
public class Test {
public static void main(String[] args) {
java.lang.management.ManagementFactory.getPlatformMBeanServer();
System.out.println("PASSED.");
}
}
would result in the following NPE on affected systems:
Exception in thread "main" java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:220)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:296)
at java.base/java.nio.file.Path.of(Path.java:148)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$0(CgroupUtil.java:67)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:571)
at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:69)
at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:190)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:160)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:85)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getInstance(CgroupV1Subsystem.java:61)
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:119)
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:89)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:198)
at java.base/jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58)
at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:175)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:316)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$4.nameToMBeanMap(PlatformMBeanProviderImpl.java:235)
at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:489)
at java.base/java.util.stream.ReferencePipeline$7$1FlatMap.accept(ReferencePipeline.java:289)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:197)
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1788)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:153)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:176)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:636)
at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:490)
at Test.main(Test.java:3)
The relevant /proc/self/mountinfo line is
---
2207 2196 0:43 /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup cgroup rw,cpu,cpuacct
---
/proc/self/cgroup:
---
11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c
---
Note that by default, on cg v1 systems containers run with cgroupns=host while on cg v2 systems containers run with cgroupns=private by default. The issue has been observed on the default configs in unprivileged containers where the JVM runs.
Steps to reproduce on a cgroup v1 system are (using --cgroupns=host for clarity):
$ sudo podman run -ti --cgroupns=host --rm --volume=$(pwd)/build/linux-x86_64-server-release/images/jdk:/jdk:z --memory 400m fedora:39 bash -c 'bash'
[root@5aee0ffdd70b /]# /jdk/bin/java -Xlog:os+container=trace --version
[0.000s][trace][os,container] OSContainer::init: Initializing Container Support
[0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.001s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.001s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
[0.001s][trace][os,container] CPU Quota is: -1
[0.001s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us
[0.001s][trace][os,container] CPU Period is: 100000
[0.001s][trace][os,container] OSContainer::active_processor_count: 12
[0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.001s][trace][os,container] total physical memory: 67163238400
[0.001s][trace][os,container] Path to /memory.limit_in_bytes is /sys/fs/cgroup/memory/memory.limit_in_bytes
[0.001s][trace][os,container] Memory Limit is: 419430400
[0.002s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.014s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
openjdk 24-internal 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
In a separate terminal, find the PID of the shell in the container (10391 in this case) and move it to a different path, /sys/fs/cgroup/memory/test, for example like so:
$ sudo mkdir /sys/fs/cgroup/memory/test
# echo 10391 > /sys/fs/cgroup/memory/test/cgroup.procs
In the shell where the container runs try to run 'java --version' again and observe the null subsystem paths:
[root@5aee0ffdd70b /]# /jdk/bin/java -Xlog:os+container=trace --version
[0.000s][trace][os,container] OSContainer::init: Initializing Container Support
[0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.001s][trace][os,container] Adjusting controller path for memory: (null)
[0.001s][debug][os,container] read_string: subsystem path is null
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][debug][os,container] read_string: subsystem path is null
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup/memory, adjusting to original path /test
[0.001s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.001s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
[0.001s][trace][os,container] CPU Quota is: -1
[0.001s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us
[0.001s][trace][os,container] CPU Period is: 100000
[0.001s][trace][os,container] OSContainer::active_processor_count: 12
[0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.001s][trace][os,container] total physical memory: 67163238400
[0.001s][debug][os,container] read_string: subsystem path is null
[0.001s][trace][os,container] Memory Limit failed: -2
[0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.020s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
openjdk 24-internal 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
[root@5aee0ffdd70b /]# grep memory /proc/self/mountinfo
1476 1473 0:43 /machine.slice/libpod-5aee0ffdd70b215ba4115f31e5438fa4708be8fd3a11ad75cbc93b0869788dfd.scope/container /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory
[root@5aee0ffdd70b /]# grep memory /proc/self/cgroup
11:memory:/test
For the NPE issue, reproducer steps are similar.
In cases where the JVM isn't PID 1, for example started from a shell - and the shell process has been moved from one cgroup path to another - then the JVM might set the subsystem path to null (on cg v1).
[0.001s][trace][os,container] OSContainer::init: Initializing Container Support
[0.001s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.002s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.002s][trace][os,container] Adjusting controller path for memory: (null)
[0.002s][debug][os,container] read_string: subsystem path is null
[0.002s][trace][os,container] Memory Limit failed: -2
[0.002s][debug][os,container] read_string: subsystem path is null
[0.002s][trace][os,container] Memory Limit failed: -2
[0.002s][trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup/memory, adjusting to original path /test
[0.002s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.003s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
[0.003s][trace][os,container] CPU Quota is: -1
[0.003s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us
[0.003s][trace][os,container] CPU Period is: 100000
[0.003s][trace][os,container] OSContainer::active_processor_count: 12
[0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.003s][trace][os,container] total physical memory: 67163226112
[0.003s][debug][os,container] read_string: subsystem path is null
[0.003s][trace][os,container] Memory Limit failed: -2
[0.005s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.021s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
openjdk 24-internal 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
On the Java Metrics side this would be observable by a NPE for example when the application code uses some MXBean code.
This test code:
public class Test {
public static void main(String[] args) {
java.lang.management.ManagementFactory.getPlatformMBeanServer();
System.out.println("PASSED.");
}
}
would result in the following NPE on affected systems:
Exception in thread "main" java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:220)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:296)
at java.base/java.nio.file.Path.of(Path.java:148)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$0(CgroupUtil.java:67)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:571)
at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:69)
at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:190)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:160)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:85)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getInstance(CgroupV1Subsystem.java:61)
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:119)
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:89)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:198)
at java.base/jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58)
at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:175)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:316)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$4.nameToMBeanMap(PlatformMBeanProviderImpl.java:235)
at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:489)
at java.base/java.util.stream.ReferencePipeline$7$1FlatMap.accept(ReferencePipeline.java:289)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:197)
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1788)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:153)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:176)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:636)
at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:490)
at Test.main(Test.java:3)
The relevant /proc/self/mountinfo line is
---
2207 2196 0:43 /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup cgroup rw,cpu,cpuacct
---
/proc/self/cgroup:
---
11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c
---
Note that by default, on cg v1 systems containers run with cgroupns=host while on cg v2 systems containers run with cgroupns=private by default. The issue has been observed on the default configs in unprivileged containers where the JVM runs.
Steps to reproduce on a cgroup v1 system are (using --cgroupns=host for clarity):
$ sudo podman run -ti --cgroupns=host --rm --volume=$(pwd)/build/linux-x86_64-server-release/images/jdk:/jdk:z --memory 400m fedora:39 bash -c 'bash'
[root@5aee0ffdd70b /]# /jdk/bin/java -Xlog:os+container=trace --version
[0.000s][trace][os,container] OSContainer::init: Initializing Container Support
[0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.001s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.001s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
[0.001s][trace][os,container] CPU Quota is: -1
[0.001s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us
[0.001s][trace][os,container] CPU Period is: 100000
[0.001s][trace][os,container] OSContainer::active_processor_count: 12
[0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.001s][trace][os,container] total physical memory: 67163238400
[0.001s][trace][os,container] Path to /memory.limit_in_bytes is /sys/fs/cgroup/memory/memory.limit_in_bytes
[0.001s][trace][os,container] Memory Limit is: 419430400
[0.002s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.014s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
openjdk 24-internal 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
In a separate terminal, find the PID of the shell in the container (10391 in this case) and move it to a different path, /sys/fs/cgroup/memory/test, for example like so:
$ sudo mkdir /sys/fs/cgroup/memory/test
# echo 10391 > /sys/fs/cgroup/memory/test/cgroup.procs
In the shell where the container runs try to run 'java --version' again and observe the null subsystem paths:
[root@5aee0ffdd70b /]# /jdk/bin/java -Xlog:os+container=trace --version
[0.000s][trace][os,container] OSContainer::init: Initializing Container Support
[0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.001s][trace][os,container] Adjusting controller path for memory: (null)
[0.001s][debug][os,container] read_string: subsystem path is null
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][debug][os,container] read_string: subsystem path is null
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup/memory, adjusting to original path /test
[0.001s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.001s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
[0.001s][trace][os,container] CPU Quota is: -1
[0.001s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us
[0.001s][trace][os,container] CPU Period is: 100000
[0.001s][trace][os,container] OSContainer::active_processor_count: 12
[0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.001s][trace][os,container] total physical memory: 67163238400
[0.001s][debug][os,container] read_string: subsystem path is null
[0.001s][trace][os,container] Memory Limit failed: -2
[0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.020s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
openjdk 24-internal 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
[root@5aee0ffdd70b /]# grep memory /proc/self/mountinfo
1476 1473 0:43 /machine.slice/libpod-5aee0ffdd70b215ba4115f31e5438fa4708be8fd3a11ad75cbc93b0869788dfd.scope/container /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory
[root@5aee0ffdd70b /]# grep memory /proc/self/cgroup
11:memory:/test
For the NPE issue, reproducer steps are similar.
- relates to
-
JDK-8286212 Cgroup v1 initialization causes NPE on some systems
-
- Open
-
-
JDK-8288019 [cgroups v1] cgroup path logic using substring is dead code in hotspot
-
- Open
-
- links to
-
Review(master) openjdk/jdk/21808