Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6190938

JNI calls become more expensive by a factor of 5x and causes application at least a 10% slowdown

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P3 P3
    • None
    • 5.0u1
    • hotspot
    • x86
    • windows_2000

      J2SE Version (please include all output from java -version flag):
        java version "1.5.0_01-ea"
        Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-ea-b04)
        Java HotSpot(TM) Server VM (build 1.5.0_01-ea-b04, mixed mode)

      Does this problem occur on J2SE 1.3, 1.4.x or 1.5? Yes / No (pick one)
        Occurs on 1.5.0 and all later releases

      Operating System Configuration Information (be specific):
        Microsoft Windows 2000 [Version 5.00.2195]

      Hardware Configuration Information (be specific):
        1.7GHz Xeon processor (dual processor but test is single threaded)

      Bug Description:

      I believe that bug 5105765 was closed prematurely for the following reasons:
         1) Although ABI is ambiguous on this point, I believe there are strong
            arguments for considering native code that changes the SSE control
            flags (in the mxcsr register) to as being buggy or erroneous rather
            than as exhibiting acceptable behavior.

         2) Even if changing the SSE control flags is defined as allowed behavior,
            there is a cheaper solution to guard against it than what is currently
            implemented in Java 1.5.0 -server on x86 processors.

      Let me start with the second point first. Setting the SSE control
      flags in the mxcsr register is a serializing and hence very expensive
      operation. Moreover, in most cases it is unnecessary as most native code
      is well behaved and does not change the SSE control flags. Thus in
      most cases, it is cheaper to check to see if the SSE control flags have
      been changed first and then only resetting them if actually necessary.
      While there is still some cost for this check since reading the mxcsr
      register is somewhat expensive, it is much less expensive than actually
      setting the mxcsr register as demonstrated in the code included at the end
      of this bug report. Thus if you feel that native code should be allowed
      to change the SSE control flags, you can reduce the cost of correcting such
      changes by only changing the mxcsr register if it was actually changed.

      Now back to the first point. The official documents are unfortunately
      somewhat ambiguous about whether a procedure or function should be
      allowed to change the SSE control flags (ie should the mxcsr be treated
      as volatile or caller-saved). In the IA-32 Intel Architecture Software
      Developers Manual Volume 1, section 11.6.10.2 describes how to save SSE
      state across a procedure call including both the XMM and MXCSR registers
      using the appropriate instructions if required. However the next section,
      11.6.10.3, titled "Caller-Save Requirement for Procedure and Function
      Calls" requires only saving the XMM registers (and does not mention the
      MXCSR register). It explicitly says that "The primary reason for using the
      caller-save convention [for the XMM registers] is to prevent performance
      degradation". On page 5-21 of the Intel Software Optimization manual, it
      states that "Frequent changes to the MXCSR register should be avoided
      since there is a penalty associated with writing this register" and on
      page 2-59 it makes clear that writing the mxcsr register is an expensive
      serializing instruction that is expected to be used infrequently. From
      this evidence, I think we can reasonably conclude that the caller-save
      requirement was meant to apply to the XMM registers only and not the MXCSR
      register.

      For further empirical evidence, we can see that this is precisely how
      other compilers treat the MXCSR register. Neither the Intel nor the
      Microsoft c++ compilers will automatically insert a save and restore of
      the mxcsr register around a procedure call even when it is impossible for
      them to prove that mxcsr register has not changed (for example when calling
      through a function pointer). However they will both automatically save and
      restore the XMM registers before and after a procedure call. Also note that
      by convention the x87 floating point control word is not treated as volatile
      and is not saved and restored around a procedure call by any compiler I know
      of including Java. It seems very reasonable that the SSE control register,
      mxcsr, should be treated analogously to the x87 control register, fcw.

      In Agner Fog's survey of the calling conventions used by various C++
      compilers and operating systems for x86 systems, he states that "The
      floating point control word and bit 6-15 of the MXCSR register must be
      saved and restored by any procedure that changes them, except for
      procedures that have the purpose of changing these".
      http://www.agner.org/assem/calling_conventions.pdf
      In other words, the mxcsr should be treated as callee-saved (or
      non-volatile) unless the programmer explicitly states otherwise.

      Thus I hope that I have convinced you that if the native code invoked by
      a JNI call does change the SSE control register, mxcsr, that this should
      be treated as a bug (like writing data to random memory locations).
      However it is a bug that can be easily detected by either checking the
      mxcsr register after each JNI call or preferably by only checking
      when a command-line flag such as the -Xcheck:jni flag is set. This
      would then not impose any extra performance penalty on Java programs
      using non-buggy native code, while still allowing the error to be
      detected when desired.


      Steps to Reproduce (be specific):


      REPRODUCIBILITY :
         This bug can be reproduced always.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :

      The following program demonstrates the problem. That JNI calls have
      become more expensive and in some cases by a factor of 5x. I also
      included code to demonstrate that testing to see if the mxcsr register
      has actually changed is cheaper than the current behavior of always
      setting it regardless of its current value.


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -

      java version "1.4.2_03"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
      Java HotSpot(TM) Server VM (build 1.4.2_03-b02, mixed mode)
      mxcsr 8064
      avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
      avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
      avg 148.4 ns total 2.97E0 s for JNI (~ 252.3 cycles)
      avg 162.5 ns total 3.25E0 s for JNI and mult (~ 276.2 cycles)
      avg 76.6 ns total 1.53E0 s for Save&Restore MXCSR (~ 130.2 cycles)
      avg 14.9 ns total 2.97E-1 s for Save&Test MXCSR (~ 25.2 cycles)
      avg 2.3 ns total 4.60E-2 s for assign (~ 3.9 cycles)
      avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
      avg 148.5 ns total 2.97E0 s for JNI (~ 252.4 cycles)
      avg 161.0 ns total 3.22E0 s for JNI and mult (~ 273.6 cycles)
      avg 75.8 ns total 1.52E0 s for Save&Restore MXCSR (~ 128.9 cycles)
      avg 14.9 ns total 2.97E-1 s for Save&Test MXCSR (~ 25.2 cycles)

      ACTUAL -

      java version "1.5.0_01-ea"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-ea-b04)
      Java HotSpot(TM) Server VM (build 1.5.0_01-ea-b04, mixed mode)
      mxcsr 8064
      avg 2.3 ns total 4.60E-2 s for assign (~ 3.9 cycles)
      avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles)
      avg 193.8 ns total 3.88E0 s for JNI (~ 329.4 cycles)
      avg 887.5 ns total 1.78E1 s for JNI and mult (~ 1508.8
      cycles)
      avg 75.8 ns total 1.52E0 s for Save&Restore MXCSR (~ 128.9 cycles)
      avg 14.9 ns total 2.97E-1 s for Save&Test MXCSR (~ 25.2 cycles)
      avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles)
      avg 2.3 ns total 4.60E-2 s for mult (~ 3.9 cycles)
      avg 194.6 ns total 3.89E0 s for JNI (~ 330.7 cycles)
      avg 889.8 ns total 1.78E1 s for JNI and mult (~ 1512.7
      cycles)
      avg 76.6 ns total 1.53E0 s for Save&Restore MXCSR (~ 130.1 cycles)
      avg 14.9 ns total 2.97E-1 s for Save&Test MXCSR (~ 25.2 cycles)

      CUSTOMER SUBMITTED WORKAROUND :
      None that are good. This bug causes at least a 10% slowdown in our
      real-world large rendering application and will affect any Java program
      that uses the server JVM and makes lots of JNI calls (for example, programs
      using JOGL to access openGL frequently).

      One can use the client JVM or disable the use of SSE but these cause
      even larger slowdowns in our application than this bug does, and thus are
      not attractive alternatives. We will stick with Java 1.4.2 until these
      issues resolved.


      Include test programs - JNIOpsTestv2.java for java code and JNIOpsTestv2.c for
      c code programs.
      ###@###.### 11/4/04 20:24 GMT

            bobv Bob Vandette (Inactive)
            tyao Ting-Yun Ingrid Yao (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: