Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4932478

Server HotSpot Compiler spinning on PrintWriter.flush

    XMLWordPrintable

Details

    • 1.5
    • sparc
    • solaris_8
    • Not verified

    Description

      Customer is running their server under heavy load, 1000 concurrent HTTP
      clients. The server is configured with 1700 client threads and is running
      on a 4-way 700mHz Xeon box with 2GB RAM using Windows Win2K Advanced
      Server and they have tested on Solaris 8 Sparc and see the same problem.

      We have recreated the customer environment in our lab on a 4-way SunBlade
      using Solaris 8. We install the Tomcat JSP examples into their web container
      and use a load generator to create 2000 threads to ping the date.jsp. We ran
      their web container with the 1.4.1 and 1.4.2 runtime and it seems the Server
      compiler is hitting some type of deadlock under load.

      Many of their worker threads on the server are stuck in a
      java/io/PrintWriter.flush call.

      When running the same test with the client compiler in 1.4.1 the problem
      goes away. We have also worked around the problem by excluding the
      PrinterWriter.flush method in a .hotspot_compiler file.

      The pstacks are in the attachments. The 1.4.2 pstack displays most of the
      symbols since we were able to run with a debug build that a custom pstack
      is able to translate tthe compiled Java methods.

      Here's an example thread showing the thread waiting in flush():

      "client1500" daemon prio=5 tid=0x283D0EE0 nid=0x12a8 runnable
      [400ef000..400efdc0]
      at java.io.PrintWriter.flush(PrintWriter.java:121)
      at com.sssw.srv.jsp.AgoJspWriter.flush(AgoJspWriter.java:198)
      at com.sssw.srv.jsp.AgoJspPageContext.release(AgoJspPageContext.java:87)
      at
      com.sssw.srv.jsp.AgoJspFactory.releasePageContext(AgoJspFactory.java:59)
      at
      com.sssw.gen.jsp.jsp_include_include_jsp_1016613202._jspService(jsp_include_include_jsp_1016613202.java:104)
      at com.sssw.srv.jsp.AgoHttpJspPage.service(AgoHttpJspPage.java:94)
      at com.sssw.srv.busobj.AgoFilterChain.doServlet(AgoFilterChain.java:394)
      at com.sssw.srv.busobj.AgoFilterChain.doFilter(AgoFilterChain.java:180)
      at com.sssw.srv.resources.AgWarResource.service(AgWarResource.java:629)
      at
      com.sssw.srv.resources.AgWarURLResource.perform(AgWarURLResource.java:113)
      at com.sssw.srv.http.httpd.perform(httpd.java:5568)
      at com.sssw.srv.http.Client.processRequest(Client.java:905)
      at com.sssw.srv.http.Client.loop(Client.java:1271)
      at com.sssw.srv.http.Client.runConnection(Client.java:1487)
      at com.sssw.srv.http.Client.run(Client.java:1435)
      at java.lang.Thread.run(Thread.java:536)

      If we let the test continue to run we will get hundreds (831 in my last
      test) of threads sitting in this state. About half of these threads sitting
      in the above state and about half of them doing a flush() from a different
      path:

      "client1499" daemon prio=5 tid=0x283D0288 nid=0xf34 runnable
      [400af000..400afdc0]
      at java.io.PrintWriter.flush(PrintWriter.java:121)
      at
      com.sssw.srv.busobj.AgoHttpReplyEvent.completeContent(AgoHttpReplyEvent.java:87)
      at com.sssw.srv.resources.AgWarResource.service(AgWarResource.java:631)
      at
      com.sssw.srv.resources.AgWarURLResource.perform(AgWarURLResource.java:113)
      at com.sssw.srv.http.httpd.perform(httpd.java:5568)
      at com.sssw.srv.http.Client.processRequest(Client.java:905)
      at com.sssw.srv.http.Client.loop(Client.java:1271)
      at com.sssw.srv.http.Client.runConnection(Client.java:1487)
      at com.sssw.srv.http.Client.run(Client.java:1435)
      at java.lang.Thread.run(Thread.java:536)

      Again, none of these PrintWriter objects are sharing the same 'lock'.
      Looking through all the threads, I don't see any thread that holds the
      lock.

      Thousands and thousands of requests complete successfully, so not all
      threads are getting hung up here, but more and more do as the test runs on.
      If we do successive <ctrl> Break's, some of the threads that were waiting
      in flush() in previous traces are not waiting in current ones. If I had to
      guess what was happening, it looks like there is some scheduling bottleneck
      coming out of the synchronized block on an SMP system.

      This is really hurting our performance / scalability. We realize that
      having this many threads is not a good design (we are fixing this in the
      release we are working on), but it is surprising that this particular
      synchronized block is causing problems while thousands of others aren't.
      ----------------------------------------------------------------------------

      Attachments

        Activity

          People

            rknippelsunw Ross Knippel (Inactive)
            atongschsunw Albert Tong-schmidt (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: