-
Enhancement
-
Resolution: Fixed
-
P4
-
None
-
b05
-
generic
-
linux
Currently the code in Hotspot in order to determine whether or not the JVM thinks it runs in a container may return false positives on a plain Linux host.
This can be observed for example by running jshell with container trace logging (it shows many traces since -XX:+UseDynamicNumberOfCompilerThreads is on by default, which queries for available memory going through the container detection code):
$ jshell -J-Xlog:os+container=trace
Bob mentions that there wasn't a reliable way to detect whether or not a JVM runs in a container:
https://bugs.openjdk.java.net/browse/JDK-8227006?focusedCommentId=14275609&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14275609
I believe this changed. We should be able to determine whether we run in a container by looking at the controller mounts inside a container. Container engines typically mount them read-only, while on a host system they are read write. This is useful to detect the "inside a container case". Note that the mount options are field 6 as per 'man procfs' under /proc/pid/mountinfo.
Host system case (note the 'rw' mount options):
$ grep cgroup /proc/self/mountinfo
53 51 0:27 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:7 - tmpfs tmpfs ro,seclabel,size=4096k,nr_inodes=1024,mode=755,inode64
54 53 0:28 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:8 - cgroup2 cgroup2 rw,seclabel,nsdelegate
55 53 0:29 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,seclabel,xattr,name=systemd
56 53 0:33 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,seclabel,blkio
57 53 0:34 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,seclabel,net_cls,net_prio
58 53 0:35 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup rw,seclabel,cpu,cpuacct
59 53 0:36 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,seclabel,pids
60 53 0:37 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,seclabel,memory
61 53 0:38 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,seclabel,rdma
62 53 0:39 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,seclabel,freezer
63 53 0:40 / /sys/fs/cgroup/misc rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,seclabel,misc
64 53 0:41 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,seclabel,perf_event
65 53 0:42 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,seclabel,hugetlb
66 53 0:43 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,seclabel,cpuset
67 53 0:44 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,seclabel,devices
Container case (note the 'ro' mount options):
# grep cgroup /proc/self/mountinfo
1531 1508 0:119 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs cgroup rw,context="system_u:object_r:container_file_t:s0:c405,c449",size=1024k,uid=15263,gid=15263,inode64
1532 1531 0:44 /user.slice /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,devices
1533 1531 0:43 / /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpuset
1534 1531 0:42 / /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,hugetlb
1535 1531 0:41 / /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,perf_event
1536 1531 0:40 / /sys/fs/cgroup/misc ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,misc
1537 1531 0:39 / /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,freezer
1538 1531 0:38 / /sys/fs/cgroup/rdma ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,rdma
1539 1531 0:37 /user.slice/user-15263.slice/user@15263.service /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory
1540 1531 0:36 /user.slice/user-15263.slice/user@15263.service /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,pids
1541 1531 0:35 /user.slice/user-15263.slice/user@15263.service /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpu,cpuacct
1542 1531 0:34 / /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,net_cls,net_prio
1543 1531 0:33 /user.slice/user-15263.slice/user@15263.service /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,blkio
1544 1531 0:29 /user.slice/user-15263.slice/user@15263.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-0f301a31-cd1d-4b62-b798-9810bc79990b.scope /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,xattr,name=systemd
Yet, looking at rw/ro mount options isn't enough. Features likeJDK-8217338 have been added to use the container detection code to figure out memory/cpu limits enforced by other means. We'd be introducing a regression when we only looked at the read/write property of controller mounts. Therefore, we need a fall-back to look at the container limits at OSContainer::init time. If there are any, we could set OSContainer::is_containerized() to true for that reason.
Using the fall-back approach only is insufficient since it's expected (asserted in container tests), for the when OpenJDK runs inside a container, (without a limit) to return is_containerized() = true.
This can be observed for example by running jshell with container trace logging (it shows many traces since -XX:+UseDynamicNumberOfCompilerThreads is on by default, which queries for available memory going through the container detection code):
$ jshell -J-Xlog:os+container=trace
Bob mentions that there wasn't a reliable way to detect whether or not a JVM runs in a container:
https://bugs.openjdk.java.net/browse/JDK-8227006?focusedCommentId=14275609&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14275609
I believe this changed. We should be able to determine whether we run in a container by looking at the controller mounts inside a container. Container engines typically mount them read-only, while on a host system they are read write. This is useful to detect the "inside a container case". Note that the mount options are field 6 as per 'man procfs' under /proc/pid/mountinfo.
Host system case (note the 'rw' mount options):
$ grep cgroup /proc/self/mountinfo
53 51 0:27 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:7 - tmpfs tmpfs ro,seclabel,size=4096k,nr_inodes=1024,mode=755,inode64
54 53 0:28 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:8 - cgroup2 cgroup2 rw,seclabel,nsdelegate
55 53 0:29 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,seclabel,xattr,name=systemd
56 53 0:33 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,seclabel,blkio
57 53 0:34 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,seclabel,net_cls,net_prio
58 53 0:35 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup rw,seclabel,cpu,cpuacct
59 53 0:36 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,seclabel,pids
60 53 0:37 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,seclabel,memory
61 53 0:38 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,seclabel,rdma
62 53 0:39 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,seclabel,freezer
63 53 0:40 / /sys/fs/cgroup/misc rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,seclabel,misc
64 53 0:41 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,seclabel,perf_event
65 53 0:42 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,seclabel,hugetlb
66 53 0:43 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,seclabel,cpuset
67 53 0:44 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,seclabel,devices
Container case (note the 'ro' mount options):
# grep cgroup /proc/self/mountinfo
1531 1508 0:119 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs cgroup rw,context="system_u:object_r:container_file_t:s0:c405,c449",size=1024k,uid=15263,gid=15263,inode64
1532 1531 0:44 /user.slice /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,devices
1533 1531 0:43 / /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpuset
1534 1531 0:42 / /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,hugetlb
1535 1531 0:41 / /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,perf_event
1536 1531 0:40 / /sys/fs/cgroup/misc ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,misc
1537 1531 0:39 / /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,freezer
1538 1531 0:38 / /sys/fs/cgroup/rdma ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,rdma
1539 1531 0:37 /user.slice/user-15263.slice/user@15263.service /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory
1540 1531 0:36 /user.slice/user-15263.slice/user@15263.service /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,pids
1541 1531 0:35 /user.slice/user-15263.slice/user@15263.service /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpu,cpuacct
1542 1531 0:34 / /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,net_cls,net_prio
1543 1531 0:33 /user.slice/user-15263.slice/user@15263.service /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,blkio
1544 1531 0:29 /user.slice/user-15263.slice/user@15263.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-0f301a31-cd1d-4b62-b798-9810bc79990b.scope /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,xattr,name=systemd
Yet, looking at rw/ro mount options isn't enough. Features like
Using the fall-back approach only is insufficient since it's expected (asserted in container tests), for the when OpenJDK runs inside a container, (without a limit) to return is_containerized() = true.
- relates to
-
JDK-8335882 platform/cgroup/TestSystemSettings.java fails on Alpine Linux
- Resolved
-
JDK-8333967 containers/cgroup/PlainRead.java fails after 8302744
- Closed
-
JDK-8254091 Need a mechanism (and API) to reliably determine if a JVM is executing in a container context
- Closed
-
JDK-8334222 exclude containers/cgroup/PlainRead.java
- Resolved
-
JDK-8264482 container info misleads on non-container environment
- Closed
(2 links to)