-
Bug
-
Resolution: Not an Issue
-
P2
-
None
-
6-pool
-
x86
-
solaris
The Scimark Fast Fourier Transform metric running on Solaris with jvm 1.6.0
performs 30% worse than RedHat 5 running with jvm 1.6.0.
Benchmark snv_74@x3220 RHEL5@x3220 %DIFF snv_74@x5355 RHEL5@x5355 %DIFF
MonteCarlo 259.89 246.16 5.58% 285.9 273.5 4.53%
FFT 51.44 72.63 -29.18% 32.6 48.62 -32.95%
LU 806.86 520.83 54.92% 871.08 529.52 64.50%
SOR 904.92 882.01 2.60% 988.03 956.29 3.32%
Sparse 510.32 500.73 1.92% 466.45 464.61 0.40%
These runs we performed on both single socket(x3220) and dual socket(x5355) Xeons
These results have also been duplicated on s10u4.
Studio Analyzer Experiment source, bytecode and disassembly outputs for scimark FFT attached for both RHEL5-U1 and snv_89.
Both run on 64bit JVMS.
Analyzer output shows hot being the same in both, with two particular instructions taking significatly longer on snv..
bytecode cpu time diffs...
diff -cw bytecode.snv_89.out bytecode.RHEL5-U1.out
Source file: /export/bench/java/scimark2a/jnt/scimark2/FFT.java
Object file: /export/bench/java/scimark2a/jnt/scimark2/FFT.class
Load Object: /export/bench/java/scimark2a/jnt/scimark2/FFT.class
Excl. Incl.
User CPU User CPU
sec. sec.
<snip>
*** 666,672 ****
0. 0. [135] 00000116: dload 8
0. 0. [135] 00000118: dload 29
0. 0. [135] 0000011a: dmul
! ## 3.232 3.232 [135] 0000011b: dsub
0. 0. [135] 0000011c: dstore 31
0. 0. [136] 0000011e: dload 6
0. 0. [136] 00000120: dload 29
--- 666,672 ----
0. 0. [135] 00000116: dload 8
0. 0. [135] 00000118: dload 29
0. 0. [135] 0000011a: dmul
! ## 1.518 1.518 [135] 0000011b: dsub
0. 0. [135] 0000011c: dstore 31
0. 0. [136] 0000011e: dload 6
0. 0. [136] 00000120: dload 29
<snip>
*** 692,698 ****
0. 0. [139] 0000013c: iload 25
0. 0. [139] 0000013e: iconst_1
0. 0. [139] 0000013f: iadd
! ## 2.662 2.662 [139] 00000140: daload
0. 0. [139] 00000141: dload 33
0. 0. [139] 00000143: dsub
0. 0. [139] 00000144: dastore
--- 692,698 ----
0. 0. [139] 0000013c: iload 25
0. 0. [139] 0000013e: iconst_1
0. 0. [139] 0000013f: iadd
! ## 1.947 1.947 [139] 00000140: daload
0. 0. [139] 00000141: dload 33
0. 0. [139] 00000143: dsub
0. 0. [139] 00000144: dastore
***************
Disassembly cpu time diffs:
bash-3.2$ more dissassem.RHEL5-U1.out
Source file: /export/bench/java/scimark2a/jnt/scimark2/FFT.java
Object file: jnt.scimark2.FFT
Load Object: JAVA_COMPILED_METHODS
Excl. Incl.
User CPU User CPU
sec. sec.
*** 180,207 ****
0. 0. [ ?] 2ea: testb %al,(%rax)
0. 0. [ ?] 2ec: addb %al,(%rax)
0. 0. [ ?] 2ee: addb %al,(%rax)
! 0. 0. [ ?] 2f0: shll %r12d
0. 0. [ ?] 2f3: movl %r12d,%r8d
0. 0. [ ?] 2f6: addl %r14d,%r8d
! 0.010 0.010 [ ?] 2f9: cmpl 0x28(%rsp),%r8d
! 0.010 0.010 [ ?] 2fe: jae .+0x433 [ 0x731 ]
0. 0. [ ?] 304: movl %r8d,%edx
0. 0. [ ?] 307: incl %edx
0. 0. [ ?] 309: movslq %r8d,%rax
! 0. 0. [ ?] 30c: movsd 0x18(%rcx,%rax,8),%xmm4
! ## 1.511 1.511 [ ?] 312: cmpl 0x28(%rsp),%edx
0. 0. [ ?] 316: jae .+0x3b3 [ 0x6c9 ]
0. 0. [ ?] 31c: movsd 0x20(%rcx,%rax,8),%xmm5
! 0. 0. [ ?] 322: movapd %xmm4,%xmm11
0. 0. [ ?] 327: mulsd %xmm9,%xmm11
! 0. 0. [ ?] 32c: movapd %xmm5,%xmm12
0. 0. [ ?] 331: mulsd %xmm9,%xmm12
! 0. 0. [ ?] 336: movapd %xmm5,%xmm1
0. 0. [ ?] 33a: mulsd %xmm10,%xmm1
0. 0. [ ?] 33f: movapd %xmm4,%xmm2
0. 0. [ ?] 343: mulsd %xmm10,%xmm2
0. 0. [ ?] 348: subsd %xmm1,%xmm11
! 0.010 0.010 [ ?] 34d: addsd %xmm2,%xmm12
0. 0. [ ?] 352: cmpl 0x28(%rsp),%r12d
0. 0. [ ?] 357: jae .+0x2f6 [ 0x64d ]
0. 0. [ ?] 35d: movl %ebp,0xc(%rsp)
--- 180,207 ----
0. 0. [ ?] 2ea: testb %al,(%rax)
0. 0. [ ?] 2ec: addb %al,(%rax)
0. 0. [ ?] 2ee: addb %al,(%rax)
! 0.022 0.022 [ ?] 2f0: shll %r12d
0. 0. [ ?] 2f3: movl %r12d,%r8d
0. 0. [ ?] 2f6: addl %r14d,%r8d
! 0. 0. [ ?] 2f9: cmpl 0x28(%rsp),%r8d
! 0.011 0.011 [ ?] 2fe: jae .+0x433 [ 0x731 ]
0. 0. [ ?] 304: movl %r8d,%edx
0. 0. [ ?] 307: incl %edx
0. 0. [ ?] 309: movslq %r8d,%rax
! 0.022 0.022 [ ?] 30c: movsd 0x18(%rcx,%rax,8),%xmm4
! ## 0.781 0.781 [ ?] 312: cmpl 0x28(%rsp),%edx
0. 0. [ ?] 316: jae .+0x3b3 [ 0x6c9 ]
0. 0. [ ?] 31c: movsd 0x20(%rcx,%rax,8),%xmm5
! 0.011 0.011 [ ?] 322: movapd %xmm4,%xmm11
0. 0. [ ?] 327: mulsd %xmm9,%xmm11
! 0.044 0.044 [ ?] 32c: movapd %xmm5,%xmm12
0. 0. [ ?] 331: mulsd %xmm9,%xmm12
! 0.055 0.055 [ ?] 336: movapd %xmm5,%xmm1
0. 0. [ ?] 33a: mulsd %xmm10,%xmm1
0. 0. [ ?] 33f: movapd %xmm4,%xmm2
0. 0. [ ?] 343: mulsd %xmm10,%xmm2
0. 0. [ ?] 348: subsd %xmm1,%xmm11
! 0. 0. [ ?] 34d: addsd %xmm2,%xmm12
0. 0. [ ?] 352: cmpl 0x28(%rsp),%r12d
0. 0. [ ?] 357: jae .+0x2f6 [ 0x64d ]
0. 0. [ ?] 35d: movl %ebp,0xc(%rsp)
***************
<snip>
*** 209,236 ****
0. 0. [ ?] 366: movq %rcx,%rbp
0. 0. [ ?] 369: movl %r12d,%ecx
0. 0. [ ?] 36c: incl %ecx
! 0.010 0.010 [ ?] 36e: movslq %r12d,%r9
0. 0. [ ?] 371: movsd 0x18(%rbp,%r9,8),%xmm1
! ## 1.431 1.431 [ ?] 378: subsd %xmm11,%xmm1
! 0. 0. [ ?] 37d: movsd %xmm1,0x18(%rbp,%rax,8)
! 0. 0. [ ?] 383: cmpl 0x28(%rsp),%ecx
! 0.020 0.020 [ ?] 387: jae .+0x214 [ 0x59b ]
0. 0. [ ?] 38d: movq %rbp,%rcx
0. 0. [ ?] 390: movsd 0x20(%rcx,%r9,8),%xmm1
0. 0. [ ?] 397: addl %r10d,%ebx
0. 0. [ ?] 39a: subsd %xmm12,%xmm1
! 0.010 0.010 [ ?] 39f: movsd %xmm1,0x20(%rcx,%rax,8)
! 0.010 0.010 [ ?] 3a5: movapd %xmm11,%xmm1
0. 0. [ ?] 3aa: addsd 0x18(%rcx,%r9,8),%xmm1
! 0.040 0.040 [ ?] 3b1: movsd %xmm1,0x18(%rcx,%r9,8)
! 0. 0. [ ?] 3b8: movapd %xmm12,%xmm1
0. 0. [ ?] 3bd: addsd 0x20(%rcx,%r9,8),%xmm1
! 0.010 0.010 [ ?] 3c4: movsd %xmm1,0x20(%rcx,%r9,8)
! 0. 0. [ ?] 3cb: testl %eax,0x350302f(%rip)
0. 0. [ ?] 3d1: cmpl 4(%rsp),%ebx
0. 0. [ ?] 3d5: jge .+0x1b4 [ 0x589 ]
0. 0. [ ?] 3db: movl %ebx,%r12d
! 0.010 0.010 [ ?] 3de: addl %r11d,%r12d
0. 0. [ ?] 3e1: movl 4(%rsp),%r9d
0. 0. [ ?] 3e6: movl 0xc(%rsp),%ebp
0. 0. [ ?] 3ea: jmp .-0xfa [ 0x2f0 ]
--- 209,236 ----
0. 0. [ ?] 366: movq %rcx,%rbp
0. 0. [ ?] 369: movl %r12d,%ecx
0. 0. [ ?] 36c: incl %ecx
! 0. 0. [ ?] 36e: movslq %r12d,%r9
0. 0. [ ?] 371: movsd 0x18(%rbp,%r9,8),%xmm1
! ## 0.924 0.924 [ ?] 378: subsd %xmm11,%xmm1
! 0.033 0.033 [ ?] 37d: movsd %xmm1,0x18(%rbp,%rax,8)
! 0.011 0.011 [ ?] 383: cmpl 0x28(%rsp),%ecx
! 0. 0. [ ?] 387: jae .+0x214 [ 0x59b ]
0. 0. [ ?] 38d: movq %rbp,%rcx
0. 0. [ ?] 390: movsd 0x20(%rcx,%r9,8),%xmm1
0. 0. [ ?] 397: addl %r10d,%ebx
0. 0. [ ?] 39a: subsd %xmm12,%xmm1
! 0.011 0.011 [ ?] 39f: movsd %xmm1,0x20(%rcx,%rax,8)
! 0. 0. [ ?] 3a5: movapd %xmm11,%xmm1
0. 0. [ ?] 3aa: addsd 0x18(%rcx,%r9,8),%xmm1
! 0.044 0.044 [ ?] 3b1: movsd %xmm1,0x18(%rcx,%r9,8)
! 0.011 0.011 [ ?] 3b8: movapd %xmm12,%xmm1
0. 0. [ ?] 3bd: addsd 0x20(%rcx,%r9,8),%xmm1
! 0.011 0.011 [ ?] 3c4: movsd %xmm1,0x20(%rcx,%r9,8)
! 0.044 0.044 [ ?] 3cb: testl %eax,0xffffffffff91ae6f(%rip)
0. 0. [ ?] 3d1: cmpl 4(%rsp),%ebx
0. 0. [ ?] 3d5: jge .+0x1b4 [ 0x589 ]
0. 0. [ ?] 3db: movl %ebx,%r12d
! 0. 0. [ ?] 3de: addl %r11d,%r12d
0. 0. [ ?] 3e1: movl 4(%rsp),%r9d
0. 0. [ ?] 3e6: movl 0xc(%rsp),%ebp
0. 0. [ ?] 3ea: jmp .-0xfa [ 0x2f0 ]
***************
disassembly for both run on snv.
rerunning on RHEL for RHEL analyzer test now to confirm results are the same.
performs 30% worse than RedHat 5 running with jvm 1.6.0.
Benchmark snv_74@x3220 RHEL5@x3220 %DIFF snv_74@x5355 RHEL5@x5355 %DIFF
MonteCarlo 259.89 246.16 5.58% 285.9 273.5 4.53%
FFT 51.44 72.63 -29.18% 32.6 48.62 -32.95%
LU 806.86 520.83 54.92% 871.08 529.52 64.50%
SOR 904.92 882.01 2.60% 988.03 956.29 3.32%
Sparse 510.32 500.73 1.92% 466.45 464.61 0.40%
These runs we performed on both single socket(x3220) and dual socket(x5355) Xeons
These results have also been duplicated on s10u4.
Studio Analyzer Experiment source, bytecode and disassembly outputs for scimark FFT attached for both RHEL5-U1 and snv_89.
Both run on 64bit JVMS.
Analyzer output shows hot being the same in both, with two particular instructions taking significatly longer on snv..
bytecode cpu time diffs...
diff -cw bytecode.snv_89.out bytecode.RHEL5-U1.out
Source file: /export/bench/java/scimark2a/jnt/scimark2/FFT.java
Object file: /export/bench/java/scimark2a/jnt/scimark2/FFT.class
Load Object: /export/bench/java/scimark2a/jnt/scimark2/FFT.class
Excl. Incl.
User CPU User CPU
sec. sec.
<snip>
*** 666,672 ****
0. 0. [135] 00000116: dload 8
0. 0. [135] 00000118: dload 29
0. 0. [135] 0000011a: dmul
! ## 3.232 3.232 [135] 0000011b: dsub
0. 0. [135] 0000011c: dstore 31
0. 0. [136] 0000011e: dload 6
0. 0. [136] 00000120: dload 29
--- 666,672 ----
0. 0. [135] 00000116: dload 8
0. 0. [135] 00000118: dload 29
0. 0. [135] 0000011a: dmul
! ## 1.518 1.518 [135] 0000011b: dsub
0. 0. [135] 0000011c: dstore 31
0. 0. [136] 0000011e: dload 6
0. 0. [136] 00000120: dload 29
<snip>
*** 692,698 ****
0. 0. [139] 0000013c: iload 25
0. 0. [139] 0000013e: iconst_1
0. 0. [139] 0000013f: iadd
! ## 2.662 2.662 [139] 00000140: daload
0. 0. [139] 00000141: dload 33
0. 0. [139] 00000143: dsub
0. 0. [139] 00000144: dastore
--- 692,698 ----
0. 0. [139] 0000013c: iload 25
0. 0. [139] 0000013e: iconst_1
0. 0. [139] 0000013f: iadd
! ## 1.947 1.947 [139] 00000140: daload
0. 0. [139] 00000141: dload 33
0. 0. [139] 00000143: dsub
0. 0. [139] 00000144: dastore
***************
Disassembly cpu time diffs:
bash-3.2$ more dissassem.RHEL5-U1.out
Source file: /export/bench/java/scimark2a/jnt/scimark2/FFT.java
Object file: jnt.scimark2.FFT
Load Object: JAVA_COMPILED_METHODS
Excl. Incl.
User CPU User CPU
sec. sec.
*** 180,207 ****
0. 0. [ ?] 2ea: testb %al,(%rax)
0. 0. [ ?] 2ec: addb %al,(%rax)
0. 0. [ ?] 2ee: addb %al,(%rax)
! 0. 0. [ ?] 2f0: shll %r12d
0. 0. [ ?] 2f3: movl %r12d,%r8d
0. 0. [ ?] 2f6: addl %r14d,%r8d
! 0.010 0.010 [ ?] 2f9: cmpl 0x28(%rsp),%r8d
! 0.010 0.010 [ ?] 2fe: jae .+0x433 [ 0x731 ]
0. 0. [ ?] 304: movl %r8d,%edx
0. 0. [ ?] 307: incl %edx
0. 0. [ ?] 309: movslq %r8d,%rax
! 0. 0. [ ?] 30c: movsd 0x18(%rcx,%rax,8),%xmm4
! ## 1.511 1.511 [ ?] 312: cmpl 0x28(%rsp),%edx
0. 0. [ ?] 316: jae .+0x3b3 [ 0x6c9 ]
0. 0. [ ?] 31c: movsd 0x20(%rcx,%rax,8),%xmm5
! 0. 0. [ ?] 322: movapd %xmm4,%xmm11
0. 0. [ ?] 327: mulsd %xmm9,%xmm11
! 0. 0. [ ?] 32c: movapd %xmm5,%xmm12
0. 0. [ ?] 331: mulsd %xmm9,%xmm12
! 0. 0. [ ?] 336: movapd %xmm5,%xmm1
0. 0. [ ?] 33a: mulsd %xmm10,%xmm1
0. 0. [ ?] 33f: movapd %xmm4,%xmm2
0. 0. [ ?] 343: mulsd %xmm10,%xmm2
0. 0. [ ?] 348: subsd %xmm1,%xmm11
! 0.010 0.010 [ ?] 34d: addsd %xmm2,%xmm12
0. 0. [ ?] 352: cmpl 0x28(%rsp),%r12d
0. 0. [ ?] 357: jae .+0x2f6 [ 0x64d ]
0. 0. [ ?] 35d: movl %ebp,0xc(%rsp)
--- 180,207 ----
0. 0. [ ?] 2ea: testb %al,(%rax)
0. 0. [ ?] 2ec: addb %al,(%rax)
0. 0. [ ?] 2ee: addb %al,(%rax)
! 0.022 0.022 [ ?] 2f0: shll %r12d
0. 0. [ ?] 2f3: movl %r12d,%r8d
0. 0. [ ?] 2f6: addl %r14d,%r8d
! 0. 0. [ ?] 2f9: cmpl 0x28(%rsp),%r8d
! 0.011 0.011 [ ?] 2fe: jae .+0x433 [ 0x731 ]
0. 0. [ ?] 304: movl %r8d,%edx
0. 0. [ ?] 307: incl %edx
0. 0. [ ?] 309: movslq %r8d,%rax
! 0.022 0.022 [ ?] 30c: movsd 0x18(%rcx,%rax,8),%xmm4
! ## 0.781 0.781 [ ?] 312: cmpl 0x28(%rsp),%edx
0. 0. [ ?] 316: jae .+0x3b3 [ 0x6c9 ]
0. 0. [ ?] 31c: movsd 0x20(%rcx,%rax,8),%xmm5
! 0.011 0.011 [ ?] 322: movapd %xmm4,%xmm11
0. 0. [ ?] 327: mulsd %xmm9,%xmm11
! 0.044 0.044 [ ?] 32c: movapd %xmm5,%xmm12
0. 0. [ ?] 331: mulsd %xmm9,%xmm12
! 0.055 0.055 [ ?] 336: movapd %xmm5,%xmm1
0. 0. [ ?] 33a: mulsd %xmm10,%xmm1
0. 0. [ ?] 33f: movapd %xmm4,%xmm2
0. 0. [ ?] 343: mulsd %xmm10,%xmm2
0. 0. [ ?] 348: subsd %xmm1,%xmm11
! 0. 0. [ ?] 34d: addsd %xmm2,%xmm12
0. 0. [ ?] 352: cmpl 0x28(%rsp),%r12d
0. 0. [ ?] 357: jae .+0x2f6 [ 0x64d ]
0. 0. [ ?] 35d: movl %ebp,0xc(%rsp)
***************
<snip>
*** 209,236 ****
0. 0. [ ?] 366: movq %rcx,%rbp
0. 0. [ ?] 369: movl %r12d,%ecx
0. 0. [ ?] 36c: incl %ecx
! 0.010 0.010 [ ?] 36e: movslq %r12d,%r9
0. 0. [ ?] 371: movsd 0x18(%rbp,%r9,8),%xmm1
! ## 1.431 1.431 [ ?] 378: subsd %xmm11,%xmm1
! 0. 0. [ ?] 37d: movsd %xmm1,0x18(%rbp,%rax,8)
! 0. 0. [ ?] 383: cmpl 0x28(%rsp),%ecx
! 0.020 0.020 [ ?] 387: jae .+0x214 [ 0x59b ]
0. 0. [ ?] 38d: movq %rbp,%rcx
0. 0. [ ?] 390: movsd 0x20(%rcx,%r9,8),%xmm1
0. 0. [ ?] 397: addl %r10d,%ebx
0. 0. [ ?] 39a: subsd %xmm12,%xmm1
! 0.010 0.010 [ ?] 39f: movsd %xmm1,0x20(%rcx,%rax,8)
! 0.010 0.010 [ ?] 3a5: movapd %xmm11,%xmm1
0. 0. [ ?] 3aa: addsd 0x18(%rcx,%r9,8),%xmm1
! 0.040 0.040 [ ?] 3b1: movsd %xmm1,0x18(%rcx,%r9,8)
! 0. 0. [ ?] 3b8: movapd %xmm12,%xmm1
0. 0. [ ?] 3bd: addsd 0x20(%rcx,%r9,8),%xmm1
! 0.010 0.010 [ ?] 3c4: movsd %xmm1,0x20(%rcx,%r9,8)
! 0. 0. [ ?] 3cb: testl %eax,0x350302f(%rip)
0. 0. [ ?] 3d1: cmpl 4(%rsp),%ebx
0. 0. [ ?] 3d5: jge .+0x1b4 [ 0x589 ]
0. 0. [ ?] 3db: movl %ebx,%r12d
! 0.010 0.010 [ ?] 3de: addl %r11d,%r12d
0. 0. [ ?] 3e1: movl 4(%rsp),%r9d
0. 0. [ ?] 3e6: movl 0xc(%rsp),%ebp
0. 0. [ ?] 3ea: jmp .-0xfa [ 0x2f0 ]
--- 209,236 ----
0. 0. [ ?] 366: movq %rcx,%rbp
0. 0. [ ?] 369: movl %r12d,%ecx
0. 0. [ ?] 36c: incl %ecx
! 0. 0. [ ?] 36e: movslq %r12d,%r9
0. 0. [ ?] 371: movsd 0x18(%rbp,%r9,8),%xmm1
! ## 0.924 0.924 [ ?] 378: subsd %xmm11,%xmm1
! 0.033 0.033 [ ?] 37d: movsd %xmm1,0x18(%rbp,%rax,8)
! 0.011 0.011 [ ?] 383: cmpl 0x28(%rsp),%ecx
! 0. 0. [ ?] 387: jae .+0x214 [ 0x59b ]
0. 0. [ ?] 38d: movq %rbp,%rcx
0. 0. [ ?] 390: movsd 0x20(%rcx,%r9,8),%xmm1
0. 0. [ ?] 397: addl %r10d,%ebx
0. 0. [ ?] 39a: subsd %xmm12,%xmm1
! 0.011 0.011 [ ?] 39f: movsd %xmm1,0x20(%rcx,%rax,8)
! 0. 0. [ ?] 3a5: movapd %xmm11,%xmm1
0. 0. [ ?] 3aa: addsd 0x18(%rcx,%r9,8),%xmm1
! 0.044 0.044 [ ?] 3b1: movsd %xmm1,0x18(%rcx,%r9,8)
! 0.011 0.011 [ ?] 3b8: movapd %xmm12,%xmm1
0. 0. [ ?] 3bd: addsd 0x20(%rcx,%r9,8),%xmm1
! 0.011 0.011 [ ?] 3c4: movsd %xmm1,0x20(%rcx,%r9,8)
! 0.044 0.044 [ ?] 3cb: testl %eax,0xffffffffff91ae6f(%rip)
0. 0. [ ?] 3d1: cmpl 4(%rsp),%ebx
0. 0. [ ?] 3d5: jge .+0x1b4 [ 0x589 ]
0. 0. [ ?] 3db: movl %ebx,%r12d
! 0. 0. [ ?] 3de: addl %r11d,%r12d
0. 0. [ ?] 3e1: movl 4(%rsp),%r9d
0. 0. [ ?] 3e6: movl 0xc(%rsp),%ebp
0. 0. [ ?] 3ea: jmp .-0xfa [ 0x2f0 ]
***************
disassembly for both run on snv.
rerunning on RHEL for RHEL analyzer test now to confirm results are the same.