Various memory ordering fixes
This is a follow-up to !6232 (closed), fixing the issues identified by TSAN:
- Drop (nearly) all memory barriers from hand-written Cmm in favour of ordered loads and stored (or, in a few places, explicitly-ordered fences)
- Elaborate on
Note [Heap memory barriers], which now contains a much more convincing soundness story - Fix a few issues in
ghcitself whereIORefs are incorrectly used with an expectation of thread-safety - Encapsulate various bits of RTS mutable global state (
n_capabilities,sched_state,recent_activity) and access them only through accessors which use the necessary atomic operations - Fix a data race in
readTVarIO# - Fix a data race in
makeStableName# - Move to a statically-allocated
Capabilitiesarray to avoid a rather tricky data race between the timer (which may be implemented by way of a signal) andsetNumCapabilities(which previously reallocated the globalcapabilitiesarray)
This is based on !6232 (closed).
Edited by Ben Gamari