Primitive atomic logical operations
We offer fetch-and-and, fetch-and-or, etc., which are available as GCC intrinsics. Unfortunately, as far as I can tell those operations are not actually offered by x86_64. So I believe GCC must be implementing them using CAS loops on that (rather important) architecture. On the other hand, that instruction set does offer a plain atomic-and, atomic-or, etc., that do not fetch. In some cases, that may be sufficient. For example, I'm currently designing a potential replacement for the stable pointer system that uses a bitmap to represent a free list. Each deletion operation needs to update the bitmap (or) and then check whether a certain threshold number of entries are free (popcnt). One straightforward way to accomplish this with available instructions would be
- Atomic OR to mark an entry free
- Load the bitmap
- Perform the popcnt test
Of course, it would be nice to have fetch-and-or here, but we don't actually need it--if a bunch of deletions happen concurrently, then at least one of them will see the popcnt over the threshold (unless intervening allocations with the same bitmap bring the popcnt below the threshold, which is also perfectly acceptable).
Could we (and should we) offer non-fetching atomic operations implemented natively where they're available? One important question is of course whether we'd actually get better performance this way than using a CAS loop. I would conjecture that this is this case under contention, but I don't actually know.