Recording this as potential problem for some combination of Linux kernels and JDK versions.
As part ofJDK-8272807 (from JDK 19), the implementation of os::pretouch_memory was changed to an atomic add of 0 operation. This change allows the memory to be used concurrently with the pre-touch operation.
The atomic add operation does not require exclusive access to the memory, as the kernel could use the zero page for initial access and then allocate physical pages on write ("copy-on-write"). However, Linux versions 5.8 and newer has different semantics for CoW (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3917c80280c9) for huge pages. It allocates small pages on CoW as opposed to large pages in older Linux versions. This resulted in unexpected fragmentation of transparent huge pages.
This was fixed in JDK 23 as part ofJDK-8315923 using MADV_POPULATE_WRITE and backported recently to JDK 21. However MADV_POPULATE_WRITE is only available for Linux 5.14 and above (https://kernelnewbies.org/Linux_5.14).
Hence, customers using JDK 19 and above with -XX:+UseTransparentHugePages and -XX:+AlwaysPreTouch flag set on Linux version from 5.8 to 5.13 would see performance degradation. The clear path forward for affected users is to upgrade the kernel. We are planning to tell our customers to follow that path.
If that path cannot be followed, we can mitigate the issue on OpenJDK by using old pretouch sequence for the affected Linux versions, like in 8338305.patch attached here. Unfortunately, we cannot just revertJDK-8272807, because there is already JDK code that depends on being able to access pretouched memory while pretouch is running, one example of this code is CDS.
As part of
The atomic add operation does not require exclusive access to the memory, as the kernel could use the zero page for initial access and then allocate physical pages on write ("copy-on-write"). However, Linux versions 5.8 and newer has different semantics for CoW (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3917c80280c9) for huge pages. It allocates small pages on CoW as opposed to large pages in older Linux versions. This resulted in unexpected fragmentation of transparent huge pages.
This was fixed in JDK 23 as part of
Hence, customers using JDK 19 and above with -XX:+UseTransparentHugePages and -XX:+AlwaysPreTouch flag set on Linux version from 5.8 to 5.13 would see performance degradation. The clear path forward for affected users is to upgrade the kernel. We are planning to tell our customers to follow that path.
If that path cannot be followed, we can mitigate the issue on OpenJDK by using old pretouch sequence for the affected Linux versions, like in 8338305.patch attached here. Unfortunately, we cannot just revert
- relates to
-
JDK-8315923 pretouch_memory by atomic-add-0 fragments huge pages unexpectedly
- Resolved
-
JDK-8272807 Permit use of memory concurrent with pretouch
- Resolved