Use proper atomic operations
This is a large-scale refactoring of our use of atomic memory operations (or, rather, fixing the lack therefore) in the runtime system.
In short, prior to C11 the C abstract machine had no notion and threads and therefore guaranteed no particular semantics for shared-memory access. Platform-specific libraries such of
pthreads offered various safe synchronization primitives, but there was no standard mechanism for achieving lock-free concurrent access. Lacking such a mechanism, programs like GHC were forced to use a variety of tricks (e.g. ubiquitous use of
volatile) to try to ensure that the compiler would produce code with the desired operational behavior.
However, these tricks abuse language features unrelated to concurrency (e.g.
volatile was intended for interaction with memory-mapped hardware) and guarantee no particular semantics; data races are undefined behavior,
volatile or otherwise.
Thankfully, this changed with the appearance of C11, which introduced proper standard atomic operations with support for relaxed acquire/release memory ordering semantics. This allows us to explicitly tell the compiler the ordering that we expect, shifting the onus of proving correctness from us (a responsibility which we have historically shirked) to the compiler. Moreover, using standard atomic operations allows us to benefit from the many tools that have been written around this interface (e.g. ThreadSanitizer).
This patch makes a few related changes:
- Use C11-style atomic operations instead of
volatileand prayer to ensure correct ordering between shared-memory operations. Note that we do not use C11 atomic variables but rather only the atomic operations themselves. This makes it far easier to support older, non-C11 compilers at the expense of less type safety.
- Refactor GHC's existing atomic primitives (e.g.
atomic_inc) to rather use the
__atomic_*family of builtins as the former as deprecated.
- Introduce support for running the ThreadSanitizer memory-concurrency validator which can identify data races and other concurrency issues.
- Fix the issues that ThreadSanitizer identifies
To avoid directly depending upon the atomic builtins I define a set of macros exposing load and store operations with various orderings:
__atomic_* builtins that these macros wrap are available in all GCC version since 4.7, released in 2012. To avoid having to emulate these operations on GCC 4.6 (the first GCC version we previously supported), I have bumped the
configure check to mandate gcc >= 4.7. Previously we claimed compatibility with GCC >= 4.6, which was released in 2012
In addition, TSAN itself provides a few interfaces for annotating "benign" races, as well as the rare cases in which the checker cannot infer a "happens-before" relation.
This branch adds a new Hadrian build flavour,
thread-sanitizer, as well as a new nightly CI job,
nightly-x86_64-linux-deb9-tsan, which validates in this flavour.
We also add a suppression file to ignore two races which I found were either benign or
All of this is summarised in Note [ThreadSanitizer] in
Concurrency issues fixed
On the whole I was quite impressed by both the specificity and accuracy of the errors reported by ThreadSanitizer; this will be an invaluable tool moving forward. Among the bugs that it caught are:
- a race in the STM implementation which I believe results in #17757
- a race in
itimerwhich could result in use-after-free (#17289)
- several thread leaks in the
- several data races in the tracking of runtime statistics
- a data race in the resizing of the stable pointer table
- Look into suppression of #17289; is it fixed?
- [ ]