-
Bug
-
Resolution: Won't Fix
-
P3
-
None
-
7
The following simple code demonstrate that on SPARC platform, the default
jvm, 32-bit jvm, and long data type compiled code is not optimized.
The filer use it to compare performance SPARC vs. Intel:
class PerfTest1 {
public static void main(String[] args) {
long t1 = System.currentTimeMillis();
for(long i=0;i<100000000000L;i++);
System.out.println((System.currentTimeMillis()-t1));
}
}
Test on T4-1:
598044 == 10min.
Test on T5-2
java PerfTest1
473015 == 7min.53sec.
Test on S11 Intel(r) Core(tm) i5-2520M CPU @ 2.50GHz
java PerfTest1
96498 == 1min 36sec.
It's RAW hazards in the generated code:
3.733 [4] 5c: ldx [%sp + 104], %g1 // Load from memory
3.562 [4] 60: inc %g1
0.821 [4] 64: stx %g1, [%sp + 104] // Store back to
memory
3.462 [4] 68: ld [%l2], %g0
1.051 [4] 6c: ldx [%l0 - 116], %g1
3.733 [4] 70: ldx [%sp + 104], %g3 // Load from memory
38.947 [4] 74: cxbl %g3, %g1, 0x5c
If the loop does any work the RAW hazard is likely to be amortised.
But.... coming from a compiler perspective, this is pretty poor code.
Obviously the loop could be eliminated since it doesn't do any real work. But
even if you cannot remove the loop, the loop counter could be held in a
register and that would avoid all the stack activity.
We test it with 64-bit jvm and it is much better, and with int data type
instead of log improves almost 80x ( 604 milisec for long vs 8 milisec. for
int)
jvm, 32-bit jvm, and long data type compiled code is not optimized.
The filer use it to compare performance SPARC vs. Intel:
class PerfTest1 {
public static void main(String[] args) {
long t1 = System.currentTimeMillis();
for(long i=0;i<100000000000L;i++);
System.out.println((System.currentTimeMillis()-t1));
}
}
Test on T4-1:
598044 == 10min.
Test on T5-2
java PerfTest1
473015 == 7min.53sec.
Test on S11 Intel(r) Core(tm) i5-2520M CPU @ 2.50GHz
java PerfTest1
96498 == 1min 36sec.
It's RAW hazards in the generated code:
3.733 [4] 5c: ldx [%sp + 104], %g1 // Load from memory
3.562 [4] 60: inc %g1
0.821 [4] 64: stx %g1, [%sp + 104] // Store back to
memory
3.462 [4] 68: ld [%l2], %g0
1.051 [4] 6c: ldx [%l0 - 116], %g1
3.733 [4] 70: ldx [%sp + 104], %g3 // Load from memory
38.947 [4] 74: cxbl %g3, %g1, 0x5c
If the loop does any work the RAW hazard is likely to be amortised.
But.... coming from a compiler perspective, this is pretty poor code.
Obviously the loop could be eliminated since it doesn't do any real work. But
even if you cannot remove the loop, the loop counter could be held in a
register and that would avoid all the stack activity.
We test it with 64-bit jvm and it is much better, and with int data type
instead of log improves almost 80x ( 604 milisec for long vs 8 milisec. for
int)