Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8000303

(bf) Allow temporary buffer cache to grow to IOV_MAX

XMLWordPrintable

    • b12
    • Not verified

      JMS non-persistent micro benchmark with 16K message size, 8 queues, runs at a rate of 40K messages per second. The RT is poor because of a particular lock inflating. The cause of lock inflation is the inefficient use of DirectByteBuffers by sun.nio.ch.IOUtil.

      java version "1.7.0-ea"
      Java(TM) SE Runtime Environment (build 1.7.0-ea-b141)
      Java HotSpot(TM) 64-Bit Server VM (build 21.0-b11, mixed mode)

      Sun Analyzer shows the following:

      java.nio.DirectByteBuffer.<init>(int)
              Exclusive User CPU Time: 3.351 ( 0.1%)
              Inclusive User CPU Time: 286.137 ( 7.5%)
                                 Size: 146
                           PC Address: 2:0x000015aa
                          Source File: /tmp/analyzer.DirectByteBuffer.java.16752
                          Object File: java.nio.DirectByteBuffer
                          Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
                         Mangled Name: java.nio.DirectByteBuffer.<init>
                              Aliases:

      (as a result, the JMS-specific path around the following method takes 19%)
      weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
              Exclusive User CPU Time: 1.070 ( 0.0%)
              Inclusive User CPU Time: 737.277 ( 19.3%)
                                 Size: 88
                           PC Address: 2:0x000012e9
                          Source File: weblogic/socket/NIOSocketMuxer.java
                          Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
                          Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
                         Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
                              Aliases:

      call tree looks like this:
       567.193 100.00 | +-weblogic.socket.
      NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
       566.003 99.79 | +-sun.nio.ch.
      SocketChannelImpl.write(java.nio.ByteBuffer[], int, int)
       555.391 98.13 | | +-sun.nio.ch.
      IOUtil.write(java.io.FileDescriptor, java.nio.ByteBuffer[], int, int, sun.nio.ch.NativeDispatcher)
       235.957 42.48 | | | +-sun.nio.ch.
      SocketDispatcher.writev(java.io.FileDescriptor, long, int)
       235.917 99.98 | | | | +-sun.nio.ch.
      FileDispatcherImpl.writev0(java.io.FileDescriptor, long, int)
       234.587 99.44 | | | | +-Java_sun_nio_ch_FileDispatcherImpl_writev0
       233.587 99.57 | | | | +-writev
         0.370 0.16 | | | | | +-__libc_enable_asynccancel
         0.360 0.15 | | | | | +-__libc_disable_asynccancel
         0.470 0.20 | | | | +-jni_fast_GetIntField
         0.020 0.01 | | | | +-_init
       208.232 37.49 | | | +-sun.nio.ch.Util.getTemporaryDirectBuffer(int)
       203.651 97.80 | | | | +-java.nio.ByteBuffer.allocateDirect(int)
       202.681 99.52 | | | | | +-java.nio.DirectByteBuffer.<init>(int)
        87.287 43.07 | | | | | +-java.nio.Bits.reserveMemory(long, int)
        64.053 31.60 | | | | | +-sun.misc.
      Unsafe.setMemory(long, long, byte)
        63.993 99.91 | | | | | | +-sun.misc.
      Unsafe.setMemory(java.lang.Object, long, long, byte)



      We notice sun.nio.ch.Util.getTemporaryDirectBuffer relies on a ThreadLocal cache of DirectByteBuffers. It appears from the profile this cache must be too small. When we modify this class to have a much larger cache, we gain 1.2x throughput improvement (becomes 48926.69 from 40805.93), primarily due to the lock getting deflated. You can appreciate the CPU reduction should have significant impact on other cases. The below shows savings seen in profile:

      java.nio.DirectByteBuffer.<init>(int)
              Exclusive User CPU Time: 0.100 ( 0.0%)
              Inclusive User CPU Time: 4.761 ( 0.1%)
                                 Size: 146
                           PC Address: 2:0x000015aa
                          Source File: /tmp/analyzer.DirectByteBuffer.java.16755
                          Object File: java.nio.DirectByteBuffer
                          Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
                         Mangled Name: java.nio.DirectByteBuffer.<init>
                              Aliases:
      weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
              Exclusive User CPU Time: 1.360 ( 0.0%)
              Inclusive User CPU Time: 434.637 ( 11.5%)
                                 Size: 88
                           PC Address: 2:0x000012e9
                          Source File: weblogic/socket/NIOSocketMuxer.java
                          Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
                          Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
                         Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
                              Aliases:

      As you can see, the DirectByteBuffer allocation went down to nothing, and the JMS-specific code path dropped 8% of CPU time (and the throughput of this code path increased 1.2x times).

            coffeys Sean Coffey
            ootenko Oleksandr Otenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: