Our Atomics are used in high-performance code within the VM, for example, garbage collectors. The performance of tight loops with atomics depends on the code quality within the loop. In our implementation, we have a LOCK_IF_MP stub that emits additional checks before actually invoking the atomic: this includes polling _processor_count, polling AssumeMP, several branches, etc:
// Adding a lock prefix to an instruction on MP machine
#define LOCK_IF_MP(mp) "cmp $0, " #mp "; je 1f; lock; 1: "
inline jint Atomic::add (jint add_value, volatile jint* dest) {
jint addend = add_value;
int mp = os::is_MP();
__asm__ volatile ( LOCK_IF_MP(%3) "xaddl %0,(%2)"
: "=r" (addend)
: "0" (addend), "r" (dest), "r" (mp)
: "cc", "memory");
return addend + add_value;
}
Since in 2016 we are mostly running heavily-threaded hardware, it might make sense to drop the LOCK_IF_MP from Atomics. (Additionally, since some time, x86 assumes lock prefix for some insns, like xchg, maybe we can drop the prefix altogether?)
// Adding a lock prefix to an instruction on MP machine
#define LOCK_IF_MP(mp) "cmp $0, " #mp "; je 1f; lock; 1: "
inline jint Atomic::add (jint add_value, volatile jint* dest) {
jint addend = add_value;
int mp = os::is_MP();
__asm__ volatile ( LOCK_IF_MP(%3) "xaddl %0,(%2)"
: "=r" (addend)
: "0" (addend), "r" (dest), "r" (mp)
: "cc", "memory");
return addend + add_value;
}
Since in 2016 we are mostly running heavily-threaded hardware, it might make sense to drop the LOCK_IF_MP from Atomics. (Additionally, since some time, x86 assumes lock prefix for some insns, like xchg, maybe we can drop the prefix altogether?)