Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-5105765

REGRESSION: JNI Performance Regression in 1.5.0 -server

XMLWordPrintable

    • b14
    • x86
    • generic, windows_2000

        Name: gm110360 Date: 09/22/2004


        FULL PRODUCT VERSION :
        java version "1.5.0-rc"
        Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
        Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode)

        ADDITIONAL OS VERSION INFORMATION :
        Microsoft Windows 2000 [Version 5.00.2195]

        EXTRA RELEVANT SYSTEM CONFIGURATION :
        1.7GHz Xeon processor (dual processor but test is single threaded)

        A DESCRIPTION OF THE PROBLEM :
        This is refinement of the bug I filed earlier with an internal review ID of: 311233

        Using double math just after a JNI call results in a dramatic performance slowdown (factor of 5 in the test case below) under 1.5.0rc-server. This
        slowdown does not occur under 1.4.2-server or any client version. I've included
        a simple program below that demonstrates the problem.
        VTune reports an extreme level of Streaming SIMD Input Assists in this case,
        though I don't know why this should occur

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        Create a simple JNI function that simply always returns the value 1.0f
        and compile it into a dll. Then run the program below to time various
        operations. Notice that mixing double math operations and JNI calls results
        in a dramatic and unexplained performance loss by roughly a factor of 5!

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        These are the results I get when running under 1.4.2-server

        avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
        avg 2.3 ns total 4.60E-2 s for mult (~ 3.9 cycles)
        avg 162.5 ns total 3.25E0 s for JNI (~ 276.2 cycles)
        avg 175.0 ns total 3.50E0 s for JNI and mult (~ 297.5 cycles)
        avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
        avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
        avg 162.5 ns total 3.25E0 s for JNI (~ 276.2 cycles)
        avg 168.0 ns total 3.36E0 s for JNI and mult (~ 285.6 cycles)

        ACTUAL -
        These are the results I get under 1.5.0rc-server

        java version "1.5.0-rc"
        Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
        Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode)

        avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
        avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
        avg 194.5 ns total 3.89E0 s for JNI (~ 330.6 cycles)
        avg 892.2 ns total 1.78E1 s for JNI and mult (~ 1516.7 cycles)
        avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
        avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
        avg 193.0 ns total 3.86E0 s for JNI (~ 328.0 cycles)
        avg 890.6 ns total 1.78E1 s for JNI and mult (~ 1514.0 cycles)

        Note that JNI calls have gotten 1.2x more expensive and mixing JNI calls and double multiplies has gotten 5.3x more expensive under 1.5.0rc-server as compared to 1.4.2-server

        REPRODUCIBILITY :
        This bug can be reproduced always.

        ---------- BEGIN SOURCE ----------

        import java.text.DecimalFormat;
        import java.util.Random;

        /** Test that show a dramatic slowdown for programs that attempt to use
         * floating point math after a JNI call.
         *
         * @author bjw Bruce Walter, Cornell Program of Computer Graphics 2004
         */
        public class JNIOpsTest {
          //target total number of repetitions of the operation
          public static final int opTargetRepetitions = 20000000;
          //size of arrays that are operated on
          public static final int arraySize = 10000;
          //number of times we need to process each array to reach total target count
          public static final int reps = opTargetRepetitions/arraySize;
          //pretty print the output numbers to make them easier to read
          public static final DecimalFormat decForm = new DecimalFormat("###0.0");
          public static final DecimalFormat sciForm = new DecimalFormat("0.00E0");
          //my processor is a 1.7GHz Xeon (actually it is a dual processor, but this test is single threaded)
          public static final double ghzProcSpeed = 1.7; //my processor is 1.7GHz
          
          //native function that simply always returns the value 1.0
          public static native float funcStaticVoid();
          
          static {
            System.load("C:/IntelCode/JNITest/JNITest/Release/JNITest.dll");
          }
          
          public static void runTimingTest(TestOp op, double result[], double src[], boolean print) {
            long time = System.currentTimeMillis();
            for(int i=0;i<reps;i++) {
              op.performOp(result,src);
            }
            time = System.currentTimeMillis() - time;
            double denom = 1000000.0/(reps*src.length);
            if (print) {
              String ps = decForm.format(time*denom);
              while (ps.length()<6) ps = " "+ps;
              ps = "avg "+ps+" ns total "+sciForm.format(time/1000.0)+" s";
              while (ps.length()<32) ps += " ";
              ps = ps+" for "+op.toString();
              while (ps.length()<50) ps += " ";
              System.out.println(ps+"\t(~ "+decForm.format(time*denom*ghzProcSpeed)+" cycles)");
            }
          }
            
          public static void main(String[] args) throws InterruptedException {
            double src[] = new double[arraySize];
            double result[] = new double[arraySize];
            Random ran = new Random(5232482349538L);
            //set the src array to be random values between zero and one exclusive
            for(int i=0;i<src.length;i++) {
              do {
                src[i] = ran.nextDouble();
              } while (src[i] == 0);
            }
            TestOp tests[] = { new AssignOp(), new MultOp(), new JNIOp(), new JNIMulOp()};
            //warm up hotspot
            for(int i=0;i<tests.length;i++) {
              runTimingTest(tests[i],result,src,false);
            }
            //now run the real tests and print the timings
            for(int i=0;i<tests.length;i++) {
              runTimingTest(tests[i],result,src,true);
            }
            //do it again to show the timings are reasonably consistent
            for(int i=0;i<tests.length;i++) {
              runTimingTest(tests[i],result,src,true);
            }
          }
          
          public abstract static class TestOp {
            public abstract void performOp(double result[], double src[]);
          }
          
          public static class AssignOp extends TestOp {
            public String toString() { return "assign"; }
            public void performOp(double result[], double src[]) {
              for(int i=0;i<src.length;i++) {
                result[i] = src[i];
              }
            }
          }
          
          public static class MultOp extends TestOp {
            public String toString() { return "mult"; }
            public void performOp(double result[], double src[]) {
              for(int i=0;i<src.length;i++) {
                result[i] = 0.143*src[i];
              }
            }
          }
          
          public static class JNIOp extends TestOp {
            public String toString() { return "JNI"; }
            public void performOp(double result[], double src[]) {
              for(int i=0;i<src.length;i++) {
                JNIOpsTest.funcStaticVoid();
              }
            }
          }
            
          public static class JNIMulOp extends TestOp {
            public String toString() { return "JNI and mult"; }
            public void performOp(double result[], double src[]) {
              for(int i=0;i<src.length;i++) {
                JNIOpsTest.funcStaticVoid();
                result[i] = 0.143*src[i];
              }
            }
          }
         
        }

        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        None that I know of other than not upgrading to 1.5.0rc and staying with 1.4.2

        Release Regression From : 1.4.2
        The above release value was the last known release where this
        bug was known to work. Since then there has been a regression.

        (Incident Review ID: 311434)
        ======================================================================
        ###@###.### 10/12/04 19:40 GMT

              bobv Bob Vandette (Inactive)
              gmanwanisunw Girish Manwani (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: