-
Bug
-
Resolution: Fixed
-
P3
-
5.0, 5.0u1
-
b14
-
x86
-
generic, windows_2000
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2121476 | 5.0u3 | Bob Vandette | P3 | Closed | Fixed | b04 |
Name: gm110360 Date: 09/22/2004
FULL PRODUCT VERSION :
java version "1.5.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows 2000 [Version 5.00.2195]
EXTRA RELEVANT SYSTEM CONFIGURATION :
1.7GHz Xeon processor (dual processor but test is single threaded)
A DESCRIPTION OF THE PROBLEM :
This is refinement of the bug I filed earlier with an internal review ID of: 311233
Using double math just after a JNI call results in a dramatic performance slowdown (factor of 5 in the test case below) under 1.5.0rc-server. This
slowdown does not occur under 1.4.2-server or any client version. I've included
a simple program below that demonstrates the problem.
VTune reports an extreme level of Streaming SIMD Input Assists in this case,
though I don't know why this should occur
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a simple JNI function that simply always returns the value 1.0f
and compile it into a dll. Then run the program below to time various
operations. Notice that mixing double math operations and JNI calls results
in a dramatic and unexplained performance loss by roughly a factor of 5!
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
These are the results I get when running under 1.4.2-server
avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
avg 2.3 ns total 4.60E-2 s for mult (~ 3.9 cycles)
avg 162.5 ns total 3.25E0 s for JNI (~ 276.2 cycles)
avg 175.0 ns total 3.50E0 s for JNI and mult (~ 297.5 cycles)
avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
avg 162.5 ns total 3.25E0 s for JNI (~ 276.2 cycles)
avg 168.0 ns total 3.36E0 s for JNI and mult (~ 285.6 cycles)
ACTUAL -
These are the results I get under 1.5.0rc-server
java version "1.5.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode)
avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
avg 194.5 ns total 3.89E0 s for JNI (~ 330.6 cycles)
avg 892.2 ns total 1.78E1 s for JNI and mult (~ 1516.7 cycles)
avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
avg 193.0 ns total 3.86E0 s for JNI (~ 328.0 cycles)
avg 890.6 ns total 1.78E1 s for JNI and mult (~ 1514.0 cycles)
Note that JNI calls have gotten 1.2x more expensive and mixing JNI calls and double multiplies has gotten 5.3x more expensive under 1.5.0rc-server as compared to 1.4.2-server
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.text.DecimalFormat;
import java.util.Random;
/** Test that show a dramatic slowdown for programs that attempt to use
* floating point math after a JNI call.
*
* @author bjw Bruce Walter, Cornell Program of Computer Graphics 2004
*/
public class JNIOpsTest {
//target total number of repetitions of the operation
public static final int opTargetRepetitions = 20000000;
//size of arrays that are operated on
public static final int arraySize = 10000;
//number of times we need to process each array to reach total target count
public static final int reps = opTargetRepetitions/arraySize;
//pretty print the output numbers to make them easier to read
public static final DecimalFormat decForm = new DecimalFormat("###0.0");
public static final DecimalFormat sciForm = new DecimalFormat("0.00E0");
//my processor is a 1.7GHz Xeon (actually it is a dual processor, but this test is single threaded)
public static final double ghzProcSpeed = 1.7; //my processor is 1.7GHz
//native function that simply always returns the value 1.0
public static native float funcStaticVoid();
static {
System.load("C:/IntelCode/JNITest/JNITest/Release/JNITest.dll");
}
public static void runTimingTest(TestOp op, double result[], double src[], boolean print) {
long time = System.currentTimeMillis();
for(int i=0;i<reps;i++) {
op.performOp(result,src);
}
time = System.currentTimeMillis() - time;
double denom = 1000000.0/(reps*src.length);
if (print) {
String ps = decForm.format(time*denom);
while (ps.length()<6) ps = " "+ps;
ps = "avg "+ps+" ns total "+sciForm.format(time/1000.0)+" s";
while (ps.length()<32) ps += " ";
ps = ps+" for "+op.toString();
while (ps.length()<50) ps += " ";
System.out.println(ps+"\t(~ "+decForm.format(time*denom*ghzProcSpeed)+" cycles)");
}
}
public static void main(String[] args) throws InterruptedException {
double src[] = new double[arraySize];
double result[] = new double[arraySize];
Random ran = new Random(5232482349538L);
//set the src array to be random values between zero and one exclusive
for(int i=0;i<src.length;i++) {
do {
src[i] = ran.nextDouble();
} while (src[i] == 0);
}
TestOp tests[] = { new AssignOp(), new MultOp(), new JNIOp(), new JNIMulOp()};
//warm up hotspot
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,false);
}
//now run the real tests and print the timings
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,true);
}
//do it again to show the timings are reasonably consistent
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,true);
}
}
public abstract static class TestOp {
public abstract void performOp(double result[], double src[]);
}
public static class AssignOp extends TestOp {
public String toString() { return "assign"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
result[i] = src[i];
}
}
}
public static class MultOp extends TestOp {
public String toString() { return "mult"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
result[i] = 0.143*src[i];
}
}
}
public static class JNIOp extends TestOp {
public String toString() { return "JNI"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
JNIOpsTest.funcStaticVoid();
}
}
}
public static class JNIMulOp extends TestOp {
public String toString() { return "JNI and mult"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
JNIOpsTest.funcStaticVoid();
result[i] = 0.143*src[i];
}
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None that I know of other than not upgrading to 1.5.0rc and staying with 1.4.2
Release Regression From : 1.4.2
The above release value was the last known release where this
bug was known to work. Since then there has been a regression.
(Incident Review ID: 311434)
======================================================================
###@###.### 10/12/04 19:40 GMT
FULL PRODUCT VERSION :
java version "1.5.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows 2000 [Version 5.00.2195]
EXTRA RELEVANT SYSTEM CONFIGURATION :
1.7GHz Xeon processor (dual processor but test is single threaded)
A DESCRIPTION OF THE PROBLEM :
This is refinement of the bug I filed earlier with an internal review ID of: 311233
Using double math just after a JNI call results in a dramatic performance slowdown (factor of 5 in the test case below) under 1.5.0rc-server. This
slowdown does not occur under 1.4.2-server or any client version. I've included
a simple program below that demonstrates the problem.
VTune reports an extreme level of Streaming SIMD Input Assists in this case,
though I don't know why this should occur
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a simple JNI function that simply always returns the value 1.0f
and compile it into a dll. Then run the program below to time various
operations. Notice that mixing double math operations and JNI calls results
in a dramatic and unexplained performance loss by roughly a factor of 5!
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
These are the results I get when running under 1.4.2-server
avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
avg 2.3 ns total 4.60E-2 s for mult (~ 3.9 cycles)
avg 162.5 ns total 3.25E0 s for JNI (~ 276.2 cycles)
avg 175.0 ns total 3.50E0 s for JNI and mult (~ 297.5 cycles)
avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
avg 162.5 ns total 3.25E0 s for JNI (~ 276.2 cycles)
avg 168.0 ns total 3.36E0 s for JNI and mult (~ 285.6 cycles)
ACTUAL -
These are the results I get under 1.5.0rc-server
java version "1.5.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode)
avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
avg 194.5 ns total 3.89E0 s for JNI (~ 330.6 cycles)
avg 892.2 ns total 1.78E1 s for JNI and mult (~ 1516.7 cycles)
avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
avg 193.0 ns total 3.86E0 s for JNI (~ 328.0 cycles)
avg 890.6 ns total 1.78E1 s for JNI and mult (~ 1514.0 cycles)
Note that JNI calls have gotten 1.2x more expensive and mixing JNI calls and double multiplies has gotten 5.3x more expensive under 1.5.0rc-server as compared to 1.4.2-server
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.text.DecimalFormat;
import java.util.Random;
/** Test that show a dramatic slowdown for programs that attempt to use
* floating point math after a JNI call.
*
* @author bjw Bruce Walter, Cornell Program of Computer Graphics 2004
*/
public class JNIOpsTest {
//target total number of repetitions of the operation
public static final int opTargetRepetitions = 20000000;
//size of arrays that are operated on
public static final int arraySize = 10000;
//number of times we need to process each array to reach total target count
public static final int reps = opTargetRepetitions/arraySize;
//pretty print the output numbers to make them easier to read
public static final DecimalFormat decForm = new DecimalFormat("###0.0");
public static final DecimalFormat sciForm = new DecimalFormat("0.00E0");
//my processor is a 1.7GHz Xeon (actually it is a dual processor, but this test is single threaded)
public static final double ghzProcSpeed = 1.7; //my processor is 1.7GHz
//native function that simply always returns the value 1.0
public static native float funcStaticVoid();
static {
System.load("C:/IntelCode/JNITest/JNITest/Release/JNITest.dll");
}
public static void runTimingTest(TestOp op, double result[], double src[], boolean print) {
long time = System.currentTimeMillis();
for(int i=0;i<reps;i++) {
op.performOp(result,src);
}
time = System.currentTimeMillis() - time;
double denom = 1000000.0/(reps*src.length);
if (print) {
String ps = decForm.format(time*denom);
while (ps.length()<6) ps = " "+ps;
ps = "avg "+ps+" ns total "+sciForm.format(time/1000.0)+" s";
while (ps.length()<32) ps += " ";
ps = ps+" for "+op.toString();
while (ps.length()<50) ps += " ";
System.out.println(ps+"\t(~ "+decForm.format(time*denom*ghzProcSpeed)+" cycles)");
}
}
public static void main(String[] args) throws InterruptedException {
double src[] = new double[arraySize];
double result[] = new double[arraySize];
Random ran = new Random(5232482349538L);
//set the src array to be random values between zero and one exclusive
for(int i=0;i<src.length;i++) {
do {
src[i] = ran.nextDouble();
} while (src[i] == 0);
}
TestOp tests[] = { new AssignOp(), new MultOp(), new JNIOp(), new JNIMulOp()};
//warm up hotspot
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,false);
}
//now run the real tests and print the timings
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,true);
}
//do it again to show the timings are reasonably consistent
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,true);
}
}
public abstract static class TestOp {
public abstract void performOp(double result[], double src[]);
}
public static class AssignOp extends TestOp {
public String toString() { return "assign"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
result[i] = src[i];
}
}
}
public static class MultOp extends TestOp {
public String toString() { return "mult"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
result[i] = 0.143*src[i];
}
}
}
public static class JNIOp extends TestOp {
public String toString() { return "JNI"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
JNIOpsTest.funcStaticVoid();
}
}
}
public static class JNIMulOp extends TestOp {
public String toString() { return "JNI and mult"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
JNIOpsTest.funcStaticVoid();
result[i] = 0.143*src[i];
}
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None that I know of other than not upgrading to 1.5.0rc and staying with 1.4.2
Release Regression From : 1.4.2
The above release value was the last known release where this
bug was known to work. Since then there has been a regression.
(Incident Review ID: 311434)
======================================================================
###@###.### 10/12/04 19:40 GMT
- backported by
-
JDK-2121476 REGRESSION: JNI Performance Regression in 1.5.0 -server
- Closed
- duplicates
-
JDK-6190938 JNI calls become more expensive by a factor of 5x and causes application at least a 10% slowdown
- Closed
-
JDK-6192333 AMD64: linux/solaris: vm should not restore mxcsr across native calls
- Closed
- relates to
-
JDK-6601004 Application crash when JTable cell edited
- Closed
-
JDK-5003738 On Intel VM does not handle the %mxcsr register correctly
- Closed