Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8307425

Socket input stream read burns CPU cycles with back-to-back poll(0) calls

XMLWordPrintable

    • b22

        This was originally found by Oli Gillespie [not an Author yet], submitting this on his behalf. Oli would do the patch work.

        One of our migrations to JDK 17 revealed an issue in JDK-8221481 that resulted in >10% CPU spent in nio waiting code. The minimized case shows how the waiting code ends up issuing back-to-back polls with zero timeout:

        ```
        import java.io.*;
        import java.net.*;

        public class ReadSingle {
            public static void main(String[] args) throws Exception {
                Socket s = new Socket("shipilev.net", 443);
                s.setSoTimeout(1);
                s.getOutputStream().write(1);
                InputStream is = s.getInputStream();
                System.out.println("Starting single byte read");
                is.read();
            }
        }
        ```

        With current JDK:

        ```
         % strace -f build/linux-x86_64-server-release/images/jdk/bin/java ReadSingle.java 2>&1 | grep poll
        [pid 35427] poll([{fd=4, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=4, revents=POLLIN}])
        [pid 35427] poll([{fd=4, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=4, revents=POLLIN|POLLHUP}])
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 35427] poll([{fd=5, events=POLLIN}], 1, 0) = 0 (Timeout)
        ```

        With JDK 17:

        ```
         % strace -f ../shipilev-jdk17u-dev/build/linux-x86_64-server-release/images/jdk/bin/java ReadSingle.java 2>&1 | grep poll
        [pid 34344] poll([{fd=4, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=4, revents=POLLIN|POLLHUP}])
        [pid 34344] poll([{fd=4, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=4, revents=POLLIN}])
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        [pid 34344] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
        ```

        With JDK 17 and `-Djdk.net.usePlainSocketImpl`:

        ```
        % strace -f ../shipilev-jdk17u-dev/build/linux-x86_64-server-release/images/jdk/bin/java -Djdk.net.usePlainSocketImpl ReadSingle.java 2>&1 | grep poll
        [pid 34653] poll([{fd=4, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=4, revents=POLLIN}])
        [pid 34653] poll([{fd=4, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=4, revents=POLLIN|POLLHUP}])
        [pid 34653] poll([{fd=6, events=POLLIN|POLLERR}], 1, 1) = 0 (Timeout)
        ```

        This looks like a regression since JDK-8221481, and the cause seems to be this truncation of nanos to millis, which truncates the small nano-wait to 0, which is then passed to `poll(0)`, which returns immediately: https://github.com/openjdk/jdk/blob/e206d57bfc09032e17d09714fc54ab2f5e961792/src/java.base/share/classes/sun/nio/ch/NioSocketImpl.java#L184

        ```
            private void park(FileDescriptor fd, int event, long nanos) throws IOException {
                Thread t = Thread.currentThread();
                if (t.isVirtual()) {
                   ...
                } else {
                    long millis;
                    if (nanos == 0) {
                        millis = -1;
                    } else {
                        millis = NANOSECONDS.toMillis(nanos); // <--- here
                    }
                    Net.poll(fd, event, millis);
                }
            }
        ```

        Note it would happen in nearly all cases where we timeout the read, as there will be a hot loop in the final sub-millisecond section which will truncate to zero millis. The read timeout would be more likely with smaller timeouts, but larger timeouts are also affected by this.

        JDK-8253119 removed `-Djdk.net.usePlainSocketImpl`, so there is no recourse in JDK 18+ at all.

        This looks similar to JDK-8210004, JDK-8065372 and others, and should probably be fixed the same way: rounding *up* to the nearest millisecond.

              ogillespie Oli Gillespie
              shade Aleksey Shipilev
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: