Dacapo xalan benchmark is around 14% slower with -XX:+UseObjectMonitorTable. For now, the OM table is off so this is when it's turned on by default.
I have tried out a couple of ideas to see if they affect performance of xalan (I'm told it's pronounced zay-lon, not x-Alan). Ideas
1. adjust size of OMCache from 2, 4, 8, 12, 24. None matter. Keeping at 8.
2. not use OMCache at all: worse.
3. not clear OM cache during GC (added oops_do which unfortunately keeps things alive). Better hit rate but no better performance overall.
4. skip using OM cache in fast path (quick_enter) since it seems to repeat checks, no difference.
5. took out spinning before inflating monitor, worse, even though the hit rate is bad:
_fast_lock_spin_failure = 37987135
_fast_lock_spin_success = 556770
_fast_lock_spin_attempt = 1039882
A table or om-cache lookup for each monitor enter, since these monitors are contended is 14% worse.
Other benchmarks don't show this regression (except Dacapo23_spring, which is maybe the same thing).
xalan perf shows the code mostly in ObjectMonitor::TrySpin with and without the table. Adaptive spinning is something that really helps xalan though.
Added some counters to the runtime code (c1-only performance was equivalently slower with OM table, so ignoring c2_MacroAssembler for now)
===== DaCapo 9.12-MR1 xalan PASSED in 4435 msec =====
_om_cache_hits = 2456302
_om_cache_misses = 1327485
_try_enter_success = 1198359
_try_enter_failure = 1257943
_try_enter_slow_failure = 958268
_try_enter_slow_success = 1672344
_fast_lock_spin_attempt = 33427
_fast_lock_spin_success = 4896
_fast_lock_spin_failure = 28531
_table_lookups = 1339097
_table_hits = 1338926
_items_count = 171
I have tried out a couple of ideas to see if they affect performance of xalan (I'm told it's pronounced zay-lon, not x-Alan). Ideas
1. adjust size of OMCache from 2, 4, 8, 12, 24. None matter. Keeping at 8.
2. not use OMCache at all: worse.
3. not clear OM cache during GC (added oops_do which unfortunately keeps things alive). Better hit rate but no better performance overall.
4. skip using OM cache in fast path (quick_enter) since it seems to repeat checks, no difference.
5. took out spinning before inflating monitor, worse, even though the hit rate is bad:
_fast_lock_spin_failure = 37987135
_fast_lock_spin_success = 556770
_fast_lock_spin_attempt = 1039882
A table or om-cache lookup for each monitor enter, since these monitors are contended is 14% worse.
Other benchmarks don't show this regression (except Dacapo23_spring, which is maybe the same thing).
xalan perf shows the code mostly in ObjectMonitor::TrySpin with and without the table. Adaptive spinning is something that really helps xalan though.
Added some counters to the runtime code (c1-only performance was equivalently slower with OM table, so ignoring c2_MacroAssembler for now)
===== DaCapo 9.12-MR1 xalan PASSED in 4435 msec =====
_om_cache_hits = 2456302
_om_cache_misses = 1327485
_try_enter_success = 1198359
_try_enter_failure = 1257943
_try_enter_slow_failure = 958268
_try_enter_slow_success = 1672344
_fast_lock_spin_attempt = 33427
_fast_lock_spin_success = 4896
_fast_lock_spin_failure = 28531
_table_lookups = 1339097
_table_hits = 1338926
_items_count = 171