Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-7176485

(bf) Allow temporary buffer cache to grow to IOV_MAX

XMLWordPrintable

    • b45
    • x86
    • linux
    • Not verified

        JMS non-persistent micro benchmark with 16K message size, 8 queues, runs at a rate of 40K messages per second. The RT is poor because of a particular lock inflating. The cause of lock inflation is the inefficient use of DirectByteBuffers by sun.nio.ch.IOUtil.

        java version "1.7.0-ea"
        Java(TM) SE Runtime Environment (build 1.7.0-ea-b141)
        Java HotSpot(TM) 64-Bit Server VM (build 21.0-b11, mixed mode)

        Sun Analyzer shows the following:

        java.nio.DirectByteBuffer.<init>(int)
                Exclusive User CPU Time: 3.351 ( 0.1%)
                Inclusive User CPU Time: 286.137 ( 7.5%)
                                   Size: 146
                             PC Address: 2:0x000015aa
                            Source File: /tmp/analyzer.DirectByteBuffer.java.16752
                            Object File: java.nio.DirectByteBuffer
                            Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
                           Mangled Name: java.nio.DirectByteBuffer.<init>
                                Aliases:

        (as a result, the JMS-specific path around the following method takes 19%)
        weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
                Exclusive User CPU Time: 1.070 ( 0.0%)
                Inclusive User CPU Time: 737.277 ( 19.3%)
                                   Size: 88
                             PC Address: 2:0x000012e9
                            Source File: weblogic/socket/NIOSocketMuxer.java
                            Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
                            Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
                           Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
                                Aliases:

        call tree looks like this:
         567.193 100.00 | +-weblogic.socket.
        NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
         566.003 99.79 | +-sun.nio.ch.
        SocketChannelImpl.write(java.nio.ByteBuffer[], int, int)
         555.391 98.13 | | +-sun.nio.ch.
        IOUtil.write(java.io.FileDescriptor, java.nio.ByteBuffer[], int, int, sun.nio.ch.NativeDispatcher)
         235.957 42.48 | | | +-sun.nio.ch.
        SocketDispatcher.writev(java.io.FileDescriptor, long, int)
         235.917 99.98 | | | | +-sun.nio.ch.
        FileDispatcherImpl.writev0(java.io.FileDescriptor, long, int)
         234.587 99.44 | | | | +-Java_sun_nio_ch_FileDispatcherImpl_writev0
         233.587 99.57 | | | | +-writev
           0.370 0.16 | | | | | +-__libc_enable_asynccancel
           0.360 0.15 | | | | | +-__libc_disable_asynccancel
           0.470 0.20 | | | | +-jni_fast_GetIntField
           0.020 0.01 | | | | +-_init
         208.232 37.49 | | | +-sun.nio.ch.Util.getTemporaryDirectBuffer(int)
         203.651 97.80 | | | | +-java.nio.ByteBuffer.allocateDirect(int)
         202.681 99.52 | | | | | +-java.nio.DirectByteBuffer.<init>(int)
          87.287 43.07 | | | | | +-java.nio.Bits.reserveMemory(long, int)
          64.053 31.60 | | | | | +-sun.misc.
        Unsafe.setMemory(long, long, byte)
          63.993 99.91 | | | | | | +-sun.misc.
        Unsafe.setMemory(java.lang.Object, long, long, byte)



        We notice sun.nio.ch.Util.getTemporaryDirectBuffer relies on a ThreadLocal cache of DirectByteBuffers. It appears from the profile this cache must be too small. When we modify this class to have a much larger cache, we gain 1.2x throughput improvement (becomes 48926.69 from 40805.93), primarily due to the lock getting deflated. You can appreciate the CPU reduction should have significant impact on other cases. The below shows savings seen in profile:

        java.nio.DirectByteBuffer.<init>(int)
                Exclusive User CPU Time: 0.100 ( 0.0%)
                Inclusive User CPU Time: 4.761 ( 0.1%)
                                   Size: 146
                             PC Address: 2:0x000015aa
                            Source File: /tmp/analyzer.DirectByteBuffer.java.16755
                            Object File: java.nio.DirectByteBuffer
                            Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
                           Mangled Name: java.nio.DirectByteBuffer.<init>
                                Aliases:
        weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
                Exclusive User CPU Time: 1.360 ( 0.0%)
                Inclusive User CPU Time: 434.637 ( 11.5%)
                                   Size: 88
                             PC Address: 2:0x000012e9
                            Source File: weblogic/socket/NIOSocketMuxer.java
                            Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
                            Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
                           Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
                                Aliases:

        As you can see, the DirectByteBuffer allocation went down to nothing, and the JMS-specific code path dropped 8% of CPU time (and the throughput of this code path increased 1.2x times).

              alanb Alan Bateman
              ootenko Oleksandr Otenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: