-
Bug
-
Resolution: Fixed
-
P3
-
6u10, 7u4
-
b45
-
x86
-
linux
-
Not verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8018883 | 7u45 | Alan Bateman | P3 | Closed | Fixed | b01 |
JDK-2226718 | 7u40 | Sean Coffey | P3 | Closed | Fixed | b08 |
JDK-8000303 | 7u10 | Sean Coffey | P3 | Closed | Fixed | b12 |
JMS non-persistent micro benchmark with 16K message size, 8 queues, runs at a rate of 40K messages per second. The RT is poor because of a particular lock inflating. The cause of lock inflation is the inefficient use of DirectByteBuffers by sun.nio.ch.IOUtil.
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b141)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b11, mixed mode)
Sun Analyzer shows the following:
java.nio.DirectByteBuffer.<init>(int)
Exclusive User CPU Time: 3.351 ( 0.1%)
Inclusive User CPU Time: 286.137 ( 7.5%)
Size: 146
PC Address: 2:0x000015aa
Source File: /tmp/analyzer.DirectByteBuffer.java.16752
Object File: java.nio.DirectByteBuffer
Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
Mangled Name: java.nio.DirectByteBuffer.<init>
Aliases:
(as a result, the JMS-specific path around the following method takes 19%)
weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
Exclusive User CPU Time: 1.070 ( 0.0%)
Inclusive User CPU Time: 737.277 ( 19.3%)
Size: 88
PC Address: 2:0x000012e9
Source File: weblogic/socket/NIOSocketMuxer.java
Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
Aliases:
call tree looks like this:
567.193 100.00 | +-weblogic.socket.
NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
566.003 99.79 | +-sun.nio.ch.
SocketChannelImpl.write(java.nio.ByteBuffer[], int, int)
555.391 98.13 | | +-sun.nio.ch.
IOUtil.write(java.io.FileDescriptor, java.nio.ByteBuffer[], int, int, sun.nio.ch.NativeDispatcher)
235.957 42.48 | | | +-sun.nio.ch.
SocketDispatcher.writev(java.io.FileDescriptor, long, int)
235.917 99.98 | | | | +-sun.nio.ch.
FileDispatcherImpl.writev0(java.io.FileDescriptor, long, int)
234.587 99.44 | | | | +-Java_sun_nio_ch_FileDispatcherImpl_writev0
233.587 99.57 | | | | +-writev
0.370 0.16 | | | | | +-__libc_enable_asynccancel
0.360 0.15 | | | | | +-__libc_disable_asynccancel
0.470 0.20 | | | | +-jni_fast_GetIntField
0.020 0.01 | | | | +-_init
208.232 37.49 | | | +-sun.nio.ch.Util.getTemporaryDirectBuffer(int)
203.651 97.80 | | | | +-java.nio.ByteBuffer.allocateDirect(int)
202.681 99.52 | | | | | +-java.nio.DirectByteBuffer.<init>(int)
87.287 43.07 | | | | | +-java.nio.Bits.reserveMemory(long, int)
64.053 31.60 | | | | | +-sun.misc.
Unsafe.setMemory(long, long, byte)
63.993 99.91 | | | | | | +-sun.misc.
Unsafe.setMemory(java.lang.Object, long, long, byte)
We notice sun.nio.ch.Util.getTemporaryDirectBuffer relies on a ThreadLocal cache of DirectByteBuffers. It appears from the profile this cache must be too small. When we modify this class to have a much larger cache, we gain 1.2x throughput improvement (becomes 48926.69 from 40805.93), primarily due to the lock getting deflated. You can appreciate the CPU reduction should have significant impact on other cases. The below shows savings seen in profile:
java.nio.DirectByteBuffer.<init>(int)
Exclusive User CPU Time: 0.100 ( 0.0%)
Inclusive User CPU Time: 4.761 ( 0.1%)
Size: 146
PC Address: 2:0x000015aa
Source File: /tmp/analyzer.DirectByteBuffer.java.16755
Object File: java.nio.DirectByteBuffer
Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
Mangled Name: java.nio.DirectByteBuffer.<init>
Aliases:
weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
Exclusive User CPU Time: 1.360 ( 0.0%)
Inclusive User CPU Time: 434.637 ( 11.5%)
Size: 88
PC Address: 2:0x000012e9
Source File: weblogic/socket/NIOSocketMuxer.java
Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
Aliases:
As you can see, the DirectByteBuffer allocation went down to nothing, and the JMS-specific code path dropped 8% of CPU time (and the throughput of this code path increased 1.2x times).
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b141)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b11, mixed mode)
Sun Analyzer shows the following:
java.nio.DirectByteBuffer.<init>(int)
Exclusive User CPU Time: 3.351 ( 0.1%)
Inclusive User CPU Time: 286.137 ( 7.5%)
Size: 146
PC Address: 2:0x000015aa
Source File: /tmp/analyzer.DirectByteBuffer.java.16752
Object File: java.nio.DirectByteBuffer
Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
Mangled Name: java.nio.DirectByteBuffer.<init>
Aliases:
(as a result, the JMS-specific path around the following method takes 19%)
weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
Exclusive User CPU Time: 1.070 ( 0.0%)
Inclusive User CPU Time: 737.277 ( 19.3%)
Size: 88
PC Address: 2:0x000012e9
Source File: weblogic/socket/NIOSocketMuxer.java
Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
Aliases:
call tree looks like this:
567.193 100.00 | +-weblogic.socket.
NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
566.003 99.79 | +-sun.nio.ch.
SocketChannelImpl.write(java.nio.ByteBuffer[], int, int)
555.391 98.13 | | +-sun.nio.ch.
IOUtil.write(java.io.FileDescriptor, java.nio.ByteBuffer[], int, int, sun.nio.ch.NativeDispatcher)
235.957 42.48 | | | +-sun.nio.ch.
SocketDispatcher.writev(java.io.FileDescriptor, long, int)
235.917 99.98 | | | | +-sun.nio.ch.
FileDispatcherImpl.writev0(java.io.FileDescriptor, long, int)
234.587 99.44 | | | | +-Java_sun_nio_ch_FileDispatcherImpl_writev0
233.587 99.57 | | | | +-writev
0.370 0.16 | | | | | +-__libc_enable_asynccancel
0.360 0.15 | | | | | +-__libc_disable_asynccancel
0.470 0.20 | | | | +-jni_fast_GetIntField
0.020 0.01 | | | | +-_init
208.232 37.49 | | | +-sun.nio.ch.Util.getTemporaryDirectBuffer(int)
203.651 97.80 | | | | +-java.nio.ByteBuffer.allocateDirect(int)
202.681 99.52 | | | | | +-java.nio.DirectByteBuffer.<init>(int)
87.287 43.07 | | | | | +-java.nio.Bits.reserveMemory(long, int)
64.053 31.60 | | | | | +-sun.misc.
Unsafe.setMemory(long, long, byte)
63.993 99.91 | | | | | | +-sun.misc.
Unsafe.setMemory(java.lang.Object, long, long, byte)
We notice sun.nio.ch.Util.getTemporaryDirectBuffer relies on a ThreadLocal cache of DirectByteBuffers. It appears from the profile this cache must be too small. When we modify this class to have a much larger cache, we gain 1.2x throughput improvement (becomes 48926.69 from 40805.93), primarily due to the lock getting deflated. You can appreciate the CPU reduction should have significant impact on other cases. The below shows savings seen in profile:
java.nio.DirectByteBuffer.<init>(int)
Exclusive User CPU Time: 0.100 ( 0.0%)
Inclusive User CPU Time: 4.761 ( 0.1%)
Size: 146
PC Address: 2:0x000015aa
Source File: /tmp/analyzer.DirectByteBuffer.java.16755
Object File: java.nio.DirectByteBuffer
Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
Mangled Name: java.nio.DirectByteBuffer.<init>
Aliases:
weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
Exclusive User CPU Time: 1.360 ( 0.0%)
Inclusive User CPU Time: 434.637 ( 11.5%)
Size: 88
PC Address: 2:0x000012e9
Source File: weblogic/socket/NIOSocketMuxer.java
Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
Aliases:
As you can see, the DirectByteBuffer allocation went down to nothing, and the JMS-specific code path dropped 8% of CPU time (and the throughput of this code path increased 1.2x times).
- backported by
-
JDK-2226718 (bf) Allow temporary buffer cache to grow to IOV_MAX
-
- Closed
-
-
JDK-8000303 (bf) Allow temporary buffer cache to grow to IOV_MAX
-
- Closed
-
-
JDK-8018883 (bf) Allow temporary buffer cache to grow to IOV_MAX
-
- Closed
-
- duplicates
-
JDK-6957106 (dc) DatagramChannel.write(ByteBuffer[]) limits array size to 16 rather than IOV_MAX
-
- Closed
-