-
Bug
-
Resolution: Fixed
-
P3
-
17.0.13, 21.0.5, 23.0.1, 24
-
b13
-
generic
-
linux
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8351890 | 24.0.2 | Sergey Chernyshev | P3 | Resolved | Fixed | b01 |
In cases where the JVM isn't PID 1, for example started from a shell - and the shell process has been moved from one cgroup path to another - then the JVM might set the subsystem path to null (on cg v1).
[0.001s][trace][os,container] OSContainer::init: Initializing Container Support
[0.001s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.002s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.002s][trace][os,container] Adjusting controller path for memory: (null)
[0.002s][debug][os,container] read_string: subsystem path is null
[0.002s][trace][os,container] Memory Limit failed: -2
[0.002s][debug][os,container] read_string: subsystem path is null
[0.002s][trace][os,container] Memory Limit failed: -2
[0.002s][trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup/memory, adjusting to original path /test
[0.002s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.003s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
[0.003s][trace][os,container] CPU Quota is: -1
[0.003s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us
[0.003s][trace][os,container] CPU Period is: 100000
[0.003s][trace][os,container] OSContainer::active_processor_count: 12
[0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.003s][trace][os,container] total physical memory: 67163226112
[0.003s][debug][os,container] read_string: subsystem path is null
[0.003s][trace][os,container] Memory Limit failed: -2
[0.005s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.021s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
openjdk 24-internal 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
On the Java Metrics side this would be observable by a NPE for example when the application code uses some MXBean code.
This test code:
public class Test {
public static void main(String[] args) {
java.lang.management.ManagementFactory.getPlatformMBeanServer();
System.out.println("PASSED.");
}
}
would result in the following NPE on affected systems:
Exception in thread "main" java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:220)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:296)
at java.base/java.nio.file.Path.of(Path.java:148)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$0(CgroupUtil.java:67)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:571)
at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:69)
at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:190)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:160)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:85)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getInstance(CgroupV1Subsystem.java:61)
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:119)
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:89)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:198)
at java.base/jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58)
at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:175)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:316)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$4.nameToMBeanMap(PlatformMBeanProviderImpl.java:235)
at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:489)
at java.base/java.util.stream.ReferencePipeline$7$1FlatMap.accept(ReferencePipeline.java:289)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:197)
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1788)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:153)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:176)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:636)
at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:490)
at Test.main(Test.java:3)
The relevant /proc/self/mountinfo line is
---
2207 2196 0:43 /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup cgroup rw,cpu,cpuacct
---
/proc/self/cgroup:
---
11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c
---
Note that by default, on cg v1 systems containers run with cgroupns=host while on cg v2 systems containers run with cgroupns=private by default. The issue has been observed on the default configs in unprivileged containers where the JVM runs.
Steps to reproduce on a cgroup v1 system are (using --cgroupns=host for clarity):
$ sudo podman run -ti --cgroupns=host --rm --volume=$(pwd)/build/linux-x86_64-server-release/images/jdk:/jdk:z --memory 400m fedora:39 bash -c 'bash'
[root@5aee0ffdd70b /]# /jdk/bin/java -Xlog:os+container=trace --version
[0.000s][trace][os,container] OSContainer::init: Initializing Container Support
[0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.001s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.001s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
[0.001s][trace][os,container] CPU Quota is: -1
[0.001s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us
[0.001s][trace][os,container] CPU Period is: 100000
[0.001s][trace][os,container] OSContainer::active_processor_count: 12
[0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.001s][trace][os,container] total physical memory: 67163238400
[0.001s][trace][os,container] Path to /memory.limit_in_bytes is /sys/fs/cgroup/memory/memory.limit_in_bytes
[0.001s][trace][os,container] Memory Limit is: 419430400
[0.002s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.014s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
openjdk 24-internal 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
In a separate terminal, find the PID of the shell in the container (10391 in this case) and move it to a different path, /sys/fs/cgroup/memory/test, for example like so:
$ sudo mkdir /sys/fs/cgroup/memory/test
# echo 10391 > /sys/fs/cgroup/memory/test/cgroup.procs
In the shell where the container runs try to run 'java --version' again and observe the null subsystem paths:
[root@5aee0ffdd70b /]# /jdk/bin/java -Xlog:os+container=trace --version
[0.000s][trace][os,container] OSContainer::init: Initializing Container Support
[0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.001s][trace][os,container] Adjusting controller path for memory: (null)
[0.001s][debug][os,container] read_string: subsystem path is null
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][debug][os,container] read_string: subsystem path is null
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup/memory, adjusting to original path /test
[0.001s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.001s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
[0.001s][trace][os,container] CPU Quota is: -1
[0.001s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us
[0.001s][trace][os,container] CPU Period is: 100000
[0.001s][trace][os,container] OSContainer::active_processor_count: 12
[0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.001s][trace][os,container] total physical memory: 67163238400
[0.001s][debug][os,container] read_string: subsystem path is null
[0.001s][trace][os,container] Memory Limit failed: -2
[0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.020s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
openjdk 24-internal 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
[root@5aee0ffdd70b /]# grep memory /proc/self/mountinfo
1476 1473 0:43 /machine.slice/libpod-5aee0ffdd70b215ba4115f31e5438fa4708be8fd3a11ad75cbc93b0869788dfd.scope/container /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory
[root@5aee0ffdd70b /]# grep memory /proc/self/cgroup
11:memory:/test
For the NPE issue, reproducer steps are similar.
- backported by
-
JDK-8351890 Cgroup v1 subsystem fails to set subsystem path
-
- Resolved
-
- causes
-
JDK-8351382 New test containers/docker/TestMemoryWithSubgroups.java is failing
-
- Resolved
-
- duplicates
-
JDK-8286212 Cgroup v1 initialization causes NPE on some systems
-
- Closed
-
-
JDK-8288019 [cgroups v1] cgroup path logic using substring is dead code in hotspot
-
- Closed
-
- relates to
-
JDK-8286991 Hotspot container subsystem unaware of VM moving cgroups
-
- Open
-
-
JDK-8352926 New test TestDockerMemoryMetricsSubgroup.java fails
-
- New
-
- links to
-
Commit(master) openjdk/jdk24u/8a4f4768
-
Commit(master) openjdk/jdk/de29ef3b
-
Review(master) openjdk/jdk24u/113
-
Review(master) openjdk/jdk/21808