Do not cas on slowpath of SpinLock unnecessarily
This MR alters the spinlock acquire to avoid cas in tight loop. The theory is that this reduces inter-cpu traffic due to reducing the writes on the bus. The best reference I can find to this idea is wikipedia.
For this reason I have not made this change when PROF_SPIN is enabled, as this is already writing to that cacheline (presumably the spin counter is on the same cacheline as the lock).
I wrote a microbenchmark: microbench.c And ran it for 4 cases: PROF_SPIN on/off, this patch on/off: bench.txt
This shows a pretty clear win when PROF_SPIN is off and this patch is applied. I do not know if this win will be important in the real world, especially after !4729 (closed) .
I'll send a separate MR for disabling PROF_SPIN, as that is probably separately contentious.