Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8238279

EPollArrayWrapper.epollWait() may return events for removed file descriptors

    XMLWordPrintable

Details

    • x86_64
    • linux_ubuntu

    Description

      ADDITIONAL SYSTEM INFORMATION :
      Dell Precision Tower with :
      - Intel(R) Xeon(R) CPU E5-1630 v4 @ 3.70GHz
      - 32 Go RAM

      Java: reproduced on all of these:
      - 1.8.0_201-b09 HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
      - 1.8.0_172-b11
      - openjdk-8-jdk 8u232-b09-0ubuntu1.1

      OS:
        Distribution: Ubuntu 64 19.10
        C Library: GNU C Library / (Ubuntu GLIBC 2.30-0ubuntu2) 2.30
        Kernel : Linux 5.3.0-26-generic (x86_64)

      A DESCRIPTION OF THE PROBLEM :
      On Ubuntu 64 systems, EPollArrayWrapper.epollWait() may return events for removed file descriptors .

      The consequences are that Jetty spins into infinite loops and consumes CPU core at 100% definitively for nothing. The only way to recover is to restart the application.

      We make an Eclipse RCP application whose help system is based on a local embedded Jetty server. The help is displayed in a SWT Browser based on Webkit GTK widget.
      Since SWT forced the usage of version 2 Webkit for the SWT Browser implementation we experience very often many CPU cores are at 100% indefinitively. Then only way to stop this is then to restart the application. This is a major inconvenience for our Linux users.

      One week of debugging let me conclude that when the Jetty server closes an HTTP connection for being idle, it:
      1) closes the connection,
      2) removes the connection for the Selector
      3) listen on the selector again
      4) receive an event from the removed connection !!!

      Here is a stacktrace where I paused the debugger, with some variables dumped:

      Thread [qtp2141257835-54] (Suspended)
      owns: EPollSelectorImpl (id=129)
      owns: Collections$UnmodifiableSet<E> (id=130)
      owns: Util$3 (id=131)
      EPollArrayWrapper.poll(long) line: 270
      EPollSelectorImpl.doSelect(long) line: 93
      EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 86
      EPollSelectorImpl(SelectorImpl).select(long) line: 97
      EPollSelectorImpl(SelectorImpl).select() line: 101
      ManagedSelector$SelectorProducer.select() line: 466
      ManagedSelector$SelectorProducer.produce() line: 403
      EatWhatYouKill.produceTask() line: 357
      EatWhatYouKill.doProduce(boolean) line: 181
      EatWhatYouKill.tryProduce(boolean) line: 168
      EatWhatYouKill.run() line: 126
      ReservedThreadExecutor$ReservedThread.run() line: 366
      QueuedThreadPool.runJob(Runnable) line: 765
      QueuedThreadPool$2.run() line: 683
      Thread.run() line: 748

      this EPollArrayWrapper (id=429)
      - epfd 128
      - incomingInterruptFD 77
      - outgoingInterruptFD 127
      - registered BitSet (id=435) {184}
      - updated 1
      - updateDescriptors (id=436) [184, 0, 0, 0, ...
      - eventsLow (id=433)
      [150] -1
      [184] 1

      getDescriptor(0) returned 150

      This bug might be the cause of the following forum posts:
      - https://forum.flashphoner.com/threads/100-cpu-usage-from-netty-epoll.11390/
      - https://www.eclipse.org/lists//jetty-users/msg07075.html
      - https://stackoverflow.com/questions/25494752/high-cpu-utilization-for-threads-that-seem-to-be-waiting
      - https://stackoverflow.com/questions/20475290/why-does-select-consume-so-much-cpu-time-in-my-program
      - https://download.oracle.com/javaee-archive/grizzly.java.net/users/2015/09/6747.html
      - https://github.com/eclipse/jetty.project/issues/1323
      - http://jetty.4.x6.nabble.com/jetty-users-high-cpu-load-NIO-Jetty7-6-0-td4548266.html

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      I have a runnable jar file to reproduce the problem. It embeds a Jetty server, Eclipse SWT and code to reproduce the problem.

      It is stored here : https://github.com/cedric780/EPollArrayWrapper-bug-demonstrator/blob/master/org.modelio.jre.epollarray.test-0.0.1-sources.jar

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The jar should consume few CPU time.
      ACTUAL -
      At least one CPU core is at 100% indefinitively.

      ---------- BEGIN SOURCE ----------
      See here:
      https://github.com/cedric780/EPollArrayWrapper-bug-demonstrator/blob/master/org.modelio.jre.epollarray.test-0.0.1-sources.jar

      https://github.com/cedric780/EPollArrayWrapper-bug-demonstrator
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      I have written a java agent that patches EPollArrayWrapper .
      It is in the same repository:
      https://github.com/cedric780/EPollArrayWrapper-bug-demonstrator/blob/master/org.modelio.jre.epollarray.patch-0.0.1-sources.jar

      FREQUENCY : often


      Attachments

        Activity

          People

            jjose Johny Jose
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated: