In adding a feature to mustang I noticed that the server vm has a performance
issue on all vm's prior to my putback in mustang. Basically once c2 makes
a call to a jni method and it goes interpreted it stays interpreted even
if code is later generated for the jni wrapper. Here's a further
description from my putback:
Here is the results of using BobV's jni microbenchmark to see what
> the addition of the native wrapper code did for performance. The
> results were on a slow P3 here. The numbers are for b24 which
> is old enough that the compilers still generated native wrappers.
> The vm was put into a b24 jdk so that the jdk was held constant.
> b24 plain
>
> client
>
> Starting Calling Native From Java
> Ending Calling Native From Java, elapsed time: 6730
> Starting Calling Static Java From Native Code
> Ending Calling Static Java From Native Code, elapsed time: 14995
> Starting Calling Non Static Java From Native Code
> Ending Calling Non Static Java From Native Code, elapsed time: 14043
>
> server
>
> Starting Calling Native From Java
> Ending Calling Native From Java, elapsed time: 10220
> Starting Calling Static Java From Native Code
> Ending Calling Static Java From Native Code, elapsed time: 13533
> Starting Calling Non Static Java From Native Code
> Ending Calling Non Static Java From Native Code, elapsed time: 13952
>
> b24 with new wrapper code
>
> client
>
> Starting Calling Native From Java
> Ending Calling Native From Java, elapsed time: 6057
> Starting Calling Static Java From Native Code
> Ending Calling Static Java From Native Code, elapsed time: 13099
> Starting Calling Non Static Java From Native Code
> Ending Calling Non Static Java From Native Code, elapsed time: 13698
>
> server
>
> Starting Calling Native From Java
> Ending Calling Native From Java, elapsed time: 5632
> Starting Calling Static Java From Native Code
> Ending Calling Static Java From Native Code, elapsed time: 13060
> Starting Calling Non Static Java From Native Code
> Ending Calling Non Static Java From Native Code, elapsed time: 13616
>
> So the new code is faster for both c1 and c2. The surprising thing is how much
> worse c2 did than c1 in b24. I know c2 didn't do great code for the native
> wrapper but this was ridiculous so I investigated it. It turns out that
> c2 has a problem with native wrappers going back thru all the releases.
> It turns out that for this benchmark the server vm is actually running
> interpreted. The reason is similar to why I initially suffered a performance
> hit on jvm98 with the no adapters code.
>
> When c2 runs this benchmark the method containing the calls to native is
> OSR'd first. Then the native call site gets resolved but since a wrapper isn't
> present it goes thru a c2i adapter to the interpreter. Because c2 is built
> with PreferInterpreterNativeStubs == true there is no code in the interpreter
> native entry to look for compiled code and bail to a compiled version. So
> even once the native wrapper is generated the code continues to run the
> interpreted wrapper code.
>
> If c2 in earlier releases was built with PreferInterpreterNativeStubs == false
> then code would have been generated to bail to compiled when it reached the
> interpreter. Presumably this was set to true because this prevents getting
> a sequence that looks like compiled-> c2i ->i2c -> compiled_wrapper. Since
> surely the adapter transitions would eliminate any possible speed improvement
> by using the compiled wrapper. There might be some performance boost to be
> had in earlier release to instead for server vms have the interpreter native
> wrapper check for compiled code and if present force a re-resolve of the
> call site. This is what I had to do to mustang in order to get back the
> jvm98 performance I lost with the no adapter change.
###@###.### 2005-03-17 19:21:45 GMT
issue on all vm's prior to my putback in mustang. Basically once c2 makes
a call to a jni method and it goes interpreted it stays interpreted even
if code is later generated for the jni wrapper. Here's a further
description from my putback:
Here is the results of using BobV's jni microbenchmark to see what
> the addition of the native wrapper code did for performance. The
> results were on a slow P3 here. The numbers are for b24 which
> is old enough that the compilers still generated native wrappers.
> The vm was put into a b24 jdk so that the jdk was held constant.
> b24 plain
>
> client
>
> Starting Calling Native From Java
> Ending Calling Native From Java, elapsed time: 6730
> Starting Calling Static Java From Native Code
> Ending Calling Static Java From Native Code, elapsed time: 14995
> Starting Calling Non Static Java From Native Code
> Ending Calling Non Static Java From Native Code, elapsed time: 14043
>
> server
>
> Starting Calling Native From Java
> Ending Calling Native From Java, elapsed time: 10220
> Starting Calling Static Java From Native Code
> Ending Calling Static Java From Native Code, elapsed time: 13533
> Starting Calling Non Static Java From Native Code
> Ending Calling Non Static Java From Native Code, elapsed time: 13952
>
> b24 with new wrapper code
>
> client
>
> Starting Calling Native From Java
> Ending Calling Native From Java, elapsed time: 6057
> Starting Calling Static Java From Native Code
> Ending Calling Static Java From Native Code, elapsed time: 13099
> Starting Calling Non Static Java From Native Code
> Ending Calling Non Static Java From Native Code, elapsed time: 13698
>
> server
>
> Starting Calling Native From Java
> Ending Calling Native From Java, elapsed time: 5632
> Starting Calling Static Java From Native Code
> Ending Calling Static Java From Native Code, elapsed time: 13060
> Starting Calling Non Static Java From Native Code
> Ending Calling Non Static Java From Native Code, elapsed time: 13616
>
> So the new code is faster for both c1 and c2. The surprising thing is how much
> worse c2 did than c1 in b24. I know c2 didn't do great code for the native
> wrapper but this was ridiculous so I investigated it. It turns out that
> c2 has a problem with native wrappers going back thru all the releases.
> It turns out that for this benchmark the server vm is actually running
> interpreted. The reason is similar to why I initially suffered a performance
> hit on jvm98 with the no adapters code.
>
> When c2 runs this benchmark the method containing the calls to native is
> OSR'd first. Then the native call site gets resolved but since a wrapper isn't
> present it goes thru a c2i adapter to the interpreter. Because c2 is built
> with PreferInterpreterNativeStubs == true there is no code in the interpreter
> native entry to look for compiled code and bail to a compiled version. So
> even once the native wrapper is generated the code continues to run the
> interpreted wrapper code.
>
> If c2 in earlier releases was built with PreferInterpreterNativeStubs == false
> then code would have been generated to bail to compiled when it reached the
> interpreter. Presumably this was set to true because this prevents getting
> a sequence that looks like compiled-> c2i ->i2c -> compiled_wrapper. Since
> surely the adapter transitions would eliminate any possible speed improvement
> by using the compiled wrapper. There might be some performance boost to be
> had in earlier release to instead for server vms have the interpreter native
> wrapper check for compiled code and if present force a re-resolve of the
> call site. This is what I had to do to mustang in order to get back the
> jvm98 performance I lost with the no adapter change.
###@###.### 2005-03-17 19:21:45 GMT